option for hashed OTU ids

hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.

http://lotus2.earlham.ac.uk/

GNU General Public License v3.0

52 stars 17 forks source link

option for hashed OTU ids #27

Closed kingtom2016 closed 1 year ago

kingtom2016 commented 1 year ago

Providing option for hashed OTU ids may help integrate ASV/OTU tables from different studies or datasets.

hildebra commented 1 year ago

Hey kingtom2016, in general I think it is not a good idea to merge different datasets after OTU/ASV clustering. This is because quality filtering steps as well as clustering algorithms use information of sequences from a different OTU/ASV. Hence my recommendation would be to run all datasets together, LotuS2 should be fast enough for this.

kingtom2016 commented 1 year ago

I used DADA2 to generage ASV (theoretically generate same ASV sequences among differents batches). Would this be also affected when integrating ASV tables from different batches? DADA2

hildebra commented 1 year ago

theoretically, practically it is not that good in my experience. you always want to cluster as many sequences as possible in the same clustering step. Further the qual filtering step is important, dada2 (and other pipelines, like lotus2) will remove sequences that are not appearing at least X times in the dataset (see derepMin parameter)

kingtom2016 commented 1 year ago

Thanks for your answer :)