hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

-taxOnly seems to discard some sequences #71

Open ingeborgklara opened 1 month ago

ingeborgklara commented 1 month ago

Hello,

I am running lotus2 only for taxonomy assignment of ITS1 sequences (so -taxOnly) (which is great! works super fast). I assumed that the output file would contain the same number of sequences as the input file, but with taxonomic assignments. However, not all sequences make it through taxonomy assignment. I have already tried adding other options to make sure none of my sequences are discarded, but it does not change. Is there an additional quality check happening during taxonomy assignment that I am not aware of?

I use the following code:

lotus2 -taxOnly ~seqtab.nochimITS.fasta -refDB ~/sh_refs_qiime_ver10_99_s_all_04.04.2024.fasta -tax4refDB ~/sh_taxonomy_qiime_ver10_99_s_all_04.04.2024.txt -taxAligner lamda -lulu 0 -ITSx 0 -clustering 7 -deactivateChimeraCheck 1 -buildPhylo 0 -backmap_id 1 -verbosity 3 -keepOfftargets 1 keepUnclassified 1 -keepTmpFiles 1 -redoTaxOnly 1

ingeborgklara commented 1 month ago

This actually also happened when using blast.

lotus2 -taxOnly ~seqtab.nochimITS.fasta -refDB UNITE -taxAligner Blast -amplicon_type ITS1 -lulu 0 -ITSx 0 -clustering 7 -deactivateChimeraCheck 1 -buildPhylo 0 -backmap_id 1 -verbosity 3 -keepOfftargets 1 keepUnclassified 1 -keepTmpFiles 1 -redoTaxOnly 1

hildebra commented 1 month ago

Hey, yes there is an additional quality check happening: hits considered not reliable are discarded automatically. This is also described in the LotuS2 paper (this is done either within the LCA algorithm and based on e-value filtering). Note that you could also use directly the UNITE DB within LotuS2, not sure if QIIME formats the tax right: lotus2 -taxOnly ~seqtab.nochimITS.fasta -refDB UNITE -taxAligner lambda-amplicon_type ITS1