tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
201 stars 47 forks source link

not finding genes randomly #75

Closed dwilkin799 closed 5 years ago

dwilkin799 commented 5 years ago

Following the last couple of updates of mlst, and your kind response to cgMLST? #67... I have been using mlst to do cgMLST assignments for a number of schemes.

It works very well in most cases... but I noticed recently that I randomly come across genomes where mlst returns lots of missing genes. It's very odd, because it will return hits for some genes, but I'll get 90% "-" assignments for most loci.

Obviously, I double check that the genes are actually there in another piece of software (Geneious). And I find exact matches to existing alleles.

The problem is reproducible, doesn't seem to have anything to do with line endings or character misuse in the genome assembly file (I remade the genome fasta files to check this)... and I cannot pin down why it is happening.

Any ideas? Many thanks, David

dwilkin799 commented 5 years ago

Sorry for the double-post. It occurred to me that the problem might also occur for classification by the 7-gene MLST profiles. And indeed this is the case, which rules-out it being an issue with the formatting of my cgMLST files...

I re-downloaded one of the offending genomes from NCBI: GCF_000007685.1_ASM768v1_genomic.fna (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/685/GCF_000007685.1_ASM768v1)

tried to type it using "--scheme leptospira" and it returns all "-" When I look for the genes manually, they are there...

thanks again, David

tseemann commented 5 years ago

If I use the default mode, it works:

$ mlst --version
mlst 2.16.2
$ mlst --quiet GCF_000007685.1_ASM768v1_genomic.fna.gz
GCF_000007685.1_ASM768v1_genomic.fna.gz leptospira_3    2       adk_3(1)       icdA_3(1)        lipL32_3(2)     lipL41_3(2)     rrs2_3(1)       secY_3(1)
tseemann commented 5 years ago

If i use the --scheme mode, it also works:

 mlst --scheme leptospira --quiet GCF_000007685.1_ASM768v1_genomic.fna.gz
GCF_000007685.1_ASM768v1_genomic.fna.gz leptospira      17      glmU_1(1)      pntA_1(1)        sucA_1(2)       tpiA_1(2)       pfkB_1(10)      mreA_1(4)      caiB_1(8)
tseemann commented 5 years ago

It sounds like it has not been installed properly, or is an old version.