Closed cimendes closed 5 years ago
I have commenced examing this, and let me tell you, my --novel
code is a total shambles :-)
I have fixed this now! Thanks for letting me know. I have updated docs too:
You can also save the "novel" alleles for submission to PubMLST::
% mlst -q --novel nouveau.fa s_myces.fasta
% cat nouveau.fa
>streptomyces.recA-e562a2cd93e701e3b58ba0670bcbba0c s_myces.fasta
GACGTGGCCCTCGGCGTCGGCGGTCTGCCGCGCGGCCGCGTCGTCGAGATCTACGGACCGGAGTCCTCC...
The format of the sequence IDs is scheme.allele-hash filename
where hash is the hexadecimal MD5 digest of the allele DNA sequence.
:fireworks: Thank you so much Torsten! :fireworks:
Hello!
First of all thank you for creating such a useful tool! We've implemented it in our routine pipeline and so far it's been working great for us. :)
Recently we've added the
--novel
option to the mlst command, that runs in autodetect mode, to save the novel alleles and we've been noticing that alleles belonging to a species that is not present in the sample are reported. For example our BMC1445c, a WGS of a Streptococcus pneumoniae sample, has the following mlst result:I'm expecting the novel alleles file to contain the sequence for the recP gene, in the spneumoniae scheme, but I get the following:
The
recP(~10)
allele, is reported, as expected, but also a bunch of "novel" alleles for the soralis scheme. At first we though it might be some contamination with soralis, but after running Kraken2 and ReMatCh in mlst mode, we're fairly certain that there is no contamination as there's only pneumo sequences (with little unclassified) and no multiple mlst alleles are present.I've attached the assembly for this particular to this issue. Thank you very much for your help!
BMC1445c.contigs.length_GCcontent_kmerCov.mappingCov.polished.fasta.zip