hildebra / lotus2

Amplicon sequencing pipelines suitable for SSU (16S, 18S), LSU (23S, 28S) and ITS.
http://lotus2.earlham.ac.uk/
GNU General Public License v3.0
52 stars 17 forks source link

Trying to process ONT partial LSU amplicons #67

Open SebaZambrano opened 3 months ago

SebaZambrano commented 3 months ago

I'm trying to cluster with VSEARCH and assign taxonomy to a set of samples using partial LSU amplicons (in fasta format) PREVIOUSLY extracted from ONT long reads using ITSx. I used the following parameters: lotus2 -m LSUmap.txt -i LSU_lotus/ -o lotuS_LSU_out/ -CL VSEARCH -amplicon_type LSU -ITSx 0 -s sdm_ONT_LSSU.txt -refDB UNITE -taxAligner blast -tax_group fungi

The pipeline stopped due to a dereplication error (it failed to identify unique sequences). The error was reported as: The sdm dereplicated output file was either empty or not existing, aborting lotus. lotuS_LSU_out//tmpFiles//derep.fas

Note that I modified the sdm file, in particular the max and min length parameters, nothing else. I'm attaching the sdm file. sdm_ONT_LSSU.txt

Any help would be very much appreciated!

hildebra commented 3 months ago

Hey, could you check how many reads passed the initial sdm filter? This should be shown on the console, or in the Log dir (otuS_LSU_out/LotuSLogs/) there should be sdm named files. My first guess would be that no read was dereplicated, because no read passed the quality controls (or too few reads). If this is the case: you need to further lower the qual filter, and the log file would guide which qual filter caused most reads being removed. Further, you can try to lower the dereplication parameters, by setting "-derepMin 0" or similar. However, note that LotuS2 was never programmed to work with ONT reads, ie many assumptions of the read clustering will be broken by the (usually) really low quality of ONT reads. hth, Falk

SebaZambrano commented 3 months ago

Hi,

It was indeed the dereplication parameters (I did lower the qual filters beforehand since I'm working with fastas). I set the "-derepMin" to 0 and it did run, but it gave an abnormally high number of OTUs (aprox. 50% of the pass reads number). I couldn't find the default value of this setting and what it means, how could you explain it? Thanks for helping.

hildebra commented 3 months ago

Hey, basically 0 means to accept every read. "2" means only to accept reads that occur two times at 100% identity. I see that in the website (https://lotus2.earlham.ac.uk/) the link is not correctly set to the examples, we will fix this later, apologies. @4less best, Falk

4less commented 3 months ago

https://lotus2.earlham.ac.uk/lotus/Derep_options.pdf

here is the pdf explaining the derep parameter. Best, Joachim