Closed VGalata closed 6 years ago
@VGalata sorry for missing this issue when you submitted it 3 weeks ago.
You have indeed found a bug. I was using S > minscore
instead of S >= minscore
.
That said, there are some things you should know, and I should improve in the docs.
The scoring used to be "100/N" per allele for a scheme with N alleles. However it was changed to be "90/N" and there is 10 points for also having a known ST type. This allows us to differentiate between having N alleles (but novel scheme) and N alleles (existing scheme).
So if you are on ecoli
with 7 genes you will need --minscore
= 90 for all intact allelles, and with my bug fix, --minscore
100.
I will create an issue to improve the docs.
Thank you so much for your report! The --minscore
is quite new.
@tseemann,
Thank you very much for your reply!
Is the scoring scheme with "90/N" vs. "100/N" for known/unknown ST in the last release (version 2.9
)? I assume not because I get known and unknown STs with --minscore 99
. Or did I miss something?
Thank you in advance!
@VGalata Hmmm. Can you send me the FASTA file that is causing this and I will see what is happening?
@tseemann
I cannot send an example for an unknown ST (because of some data policy issues). But I would use the example in my first post for the other case of a known ST:
I used an E. coli genome from NCBI: GCF_001901025.1_ASM190102v1. With mlst
version 2.9
I get SCORE=100 ecoli 1081 6/4/5/18/11/8/2 (7 genes)
(see the output in my first post). As the score is 100 and the ST is also known I assume that the scoring scheme "90/N for known STs" is not used in this version. Is that correct?
@VGalata sorry for the long delay here, i am prepping for a 2.10 release and want to resolve this. i have checked the code and minscore=90 will allow all matching alleles without a known ST, and 100 requires an ST as well. I will document this better.
% mlst -q --minscore GCA_001901025.1_ASM190102v1_genomic.fna.gz
ecoli 1081 adk(6) fumC(4) gyrB(5) icd(18) mdh(11) purA(8) recA(2)
I just saw that my last reply did not make really sense. However, regarding the issue with 90/N
and 100/N
, I get a 100 score for complete profiles with an unknown ST, here is an example output:
SCORE=100 kpneumoniae - 16/24/21/27/153/22/67 (7 genes)
This is not a big problem in my case, I just would like to be sure I understand correctly which scoring scheme is used and how the results are filtered. I will try to find an example to reproduce such results. (update: see my next comment)
I look forward to the new release!
Update
Here is an example file - ERS1018585.fna.gz - of a P. aeruginosa I downloaded some time ago from here.
I get the following result
[07:41:02] SCORE=100 paeruginosa - 6/101/4/11/1/4/2 (7 genes)
using
mlst --debug --minid 99 --mincov 75 --minscore 99
Are you using 2.10 version? https://github.com/tseemann/mlst/releases I think i released the same day you wrote the above comment!
% mlst ERS1018585.fna.gz --debug
[16:44:38] SCORE=6 achromobacter - -/-/-/-/155?/-/- (7 genes)
[16:44:38] SCORE=6 cronobacter - -/-/-/-/-/-/167? (7 genes)
[16:44:38] SCORE=6 pfluorescens - 56?/-/-/-/-/-/- (7 genes)
[16:44:38] SCORE=90 paeruginosa - 6/101/4/11/1/4/2 (7 genes)
% mlst --version
mlst 2.10
Dear Torsten,
I would like to report schemes only if all alleles have a match and, thus, I used
--minscore 100
. However, the tool reports no scheme though all alleles of the best scheme could be matched. Does the parameter mean that the result should be>
the given minimal threshold and not>=
?Here is a reproducible example using an E. coli sample from NCBI RefSeq:
Used tool versions:
mlst-2.9
,blastn: 2.2.31+
,perl 5, version 22, subversion 1 (v5.22.1)