hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

duplicated ASV sequence #28

Closed kingtom2016 closed 1 year ago

kingtom2016 commented 1 year ago

I am using header name from hashing ASV sequences to integrate ASV table from different datasets. I found that there were duplicated ASV sequences. Why?

Here is my lotus2 command: lotus2 -i $PWD -m $PWD/1_miSeqMap.sm.txt \ -s /mnt/d/Myfile/DATA/beforework/lotus2/1sdm_miSeq.txt \ -o lotus2_output \ -p miSeq -amplicon_type SSU -tax_group bacteria \ -forwardPrimer $front_f \ -reversePrimer $front_r \ -CL dada2 -refDB SLV -taxAligner lambda \ -rdp_thr 0.7 -buildPhylo 0 -t 6 -sdmThreads 6

The problem still happened when closing LULU option. (-lulu 0)

hildebra commented 1 year ago

Hey Kingtom2016, I don't know why you have duplicate sequences, did you check in the fasta sequence which these were? Is your hashing algo only taking part of the sequence? Please provide a few more informations, thanks

kingtom2016 commented 1 year ago

Thank you for rapid reply! I checked the duplicate sequences: they are identical sequences and I used whole part of each sequence.

hildebra commented 1 year ago

Hey Kingtom, it's a bit hard to say without looking at the sequences, but one possibility would be that the seed extension process was reconstructing exactly the same sequence. Could you paste the duplicated sequences here, and the file from which you got these? Can you also upload the LotuS_runlog file?