hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

False positive of mapping ASV back to reads? #36

Closed kingtom2016 closed 1 year ago

kingtom2016 commented 1 year ago

Lotus2 is a great tool!
I am recently curious that the table generated by Lotus2 shows more shared ASV between samples than other tools like QIIME2. Do you notice or have this phenomenon? Are they false positve or real shared between samples? If so, changing minimap2 parameters may alleviate this problem. I guess the threshold needed to be tuned for different Lotus2 setting (maping reads for ASV and 97% OTU intuitively requried different parameters)

hildebra commented 1 year ago

Hey kingtom2016, very good point, by default LotuS2 will use a 97% id cutoff, also for minimap2. I can imagine that this cutoff might indeed in some cases lead to false mapping of reads between samples; however for OTUs that are reads clustered at 97% id, I wouldn't see a problem. Rather for ASVs or zOTUs. So I have added a new flag "-backmap_id" that one can set to eg "-backmap_id 0.99" to only backmap reads at 99% id. Also I changed the default behaviour, that 99% will be used by default, when ASVs or zOTUs are being clustered. I pushed this now to the git repo, if it is stable I'll push it to conda later. If you could give me some feedback on the new github version, that would be much appreciated (version 2.25). cheers, Falk

kingtom2016 commented 1 year ago

Thanks! It works well. I tested this parameter in 0.97(original) 0.99 and 1 using samples from five different habitats.
Compared with 0.97, setting 0.99 will discard averagely 8% reads number (1%~20%). Compared with 0.99, setting 1 will discard averagely 30% reads number (6%~42%). It seems to lose considerable reads. :(

lotus2 -i $PWD -m $PWD/1_miSeqMap.sm.txt \ -s /mnt/d/Myfile/DATA/beforework/lotus2/1sdm_miSeq_bio.txt \ -o $output_fold \ -p miSeq -amplicon_type SSU -tax_group bacteria \ -forwardPrimer $front_f \ -reversePrimer $front_r \ -CL dada2 -id 1 -refDB SLV -taxAligner vsearch \ -rdp_thr 0.7 -buildPhylo 0 -t 16 -sdmThreads 1 -lulu 1 -backmap_id $1

hildebra commented 1 year ago

Yes I would expect something like this, intuitively I would think that 100% id is too drastic. Another problem is that DADA2 does not natively report back the clusterings, but always requires to do a backmapping of ALL reads onto the ASVs. This is different with almost all other clustering approaches in LotuS2 and might give them a little advantage therefore. However, mid quality reads will always be backmapped, in all clustering approaches, as they are ignored for the clustering itself (they will for the most part just overexaggerate diversity due to being noisy)

kingtom2016 commented 1 year ago

Thanks for your rapid reply. I believe it is not a big deal now. Wish you a good day :)