nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 118 forks source link

Multipe region amplicon sequencing analysis support (5R / SMURF / q2-sidle) #701

Closed d4straub closed 8 months ago

d4straub commented 9 months ago

Description of feature

I plan to integrate into this pipeline support for "Multipe region amplicon sequencing" analysis with SMURF via Sidle in QIIME2.

This means allowing the input of sequencing files that contain the reads of multiple PCRs products. Requirement is that the primer sequences are preserved in the files. As already done, cutadapt will be used to extract reads from a specific primer pair and the pipeline should proceed with each primer pair separately until ASVs. Then, q2-sidle will be used to reconstruct consensus taxonomies and those can be further piped into downstream analysis.

I have build and preliminary benchmarked a nextflow pipeline in https://github.com/d4straub/pipesidle that does the job of analysing multiple regions of the same gene using the output of nf-core/ampliseq. The challenge is now to add that here as a subworkflow. I imagine to change the pipeline as minimal as possible, only under the hood, with no consequences to existing functions at all, hope I am going to succeed. Additionally, I imagine one more input file will be required for multiple region input, namely a table with information on primers and regions.

I plan to work on that in the next few weeks.

d4straub commented 8 months ago

Central functionality is added to dev now.