h3abionet / TADA

TADA - Targeted Amplicon Diversity Analysis - a DADA2-focused Nextflow workflow for any targeted amplicon region
MIT License
19 stars 14 forks source link

Make pipeline work for 454 data #6

Open kifeonu opened 4 years ago

kifeonu commented 4 years ago

Make provision to use this pipeline to process 454 data

cjfields commented 3 years ago

See: https://benjjneb.github.io/dada2/faq.html#can-i-use-dada2-with-my-454-or-ion-torrent-data

cjfields commented 3 years ago

See also: https://github.com/benjjneb/dada2/issues/795

wbazant commented 2 years ago

I think this issue can be closed now! To process 454 data, one can add flags

--dadaOpt.HOMOPOLYMER_GAP_PENALTY -1 --dadaOpt.BAND_SIZE 32

to Nextflow, which will run dada as

dada(..., HOMOPOLYMER_GAP_PENALTY=-1, BAND_SIZE=32)

recommended in the tutorial.

cjfields commented 2 years ago

@wbazant any ideas on test data sets for this one? We could add it to CI testing (which will be critical to have in place for DSL2 work)

wbazant commented 2 years ago

Right, since TADA doesn't do single end now, the added dadaOpt.XXX feature adds support only hypothetical paired-end 454 data, which is not even a thing in the 454 technology!

For single end 454, SRS607719 is a stool sample containing mostly E.coli, we have it under https://microbiomedb.org/mbio/app/record/sample/MBSMPL0020-7-1 .

It weighs about 1MB, and it's available from ftp.sra.ebi.ac.uk/vol1/fastq/SRR128/009/SRR1288519/SRR1288519.fastq.gz

cjfields commented 1 year ago

@wbazant I added some prelim single-end read support, including via a sample sheet. Also supports PacBio (which we can set using the --platform parameter. So this should feasibly support 454 out of the box, though we may want to have some presets for this and PacBio added at some point.