nhoffman / dada2-nf

A Nextflow pipeline for processing 16S rRNA sequences using dada2
0 stars 2 forks source link

Include unmerged 16S SVs in pipeline output #11

Closed dhoogest closed 2 years ago

dhoogest commented 4 years ago

Building off a use case identified in https://gitlab.labmed.uw.edu/molmicro/NGS16S/issues/155, in some instances involving amplicon sequencing, the chemistry may not be sufficient to guarantee that paired reads have overlap on which to merge. This scenario has been identified in some species of Campylobacter and Helicobacter, where insertions in the V1V2 region of 16S result in longer than expected amplicons and no overlap in 250 cycle paired end sequencing.

To accommodate this situation, it would be useful for the pipeline to output sequences and weights of all unmerged SVs (i.e. inferred sequences post denoising) which fit the 16S model for downstream classification and analysis. In conversation, we have identified unmerged forward SVs as the primary target for now (for simplifiying abundance assessments relative to merged reads), although unmerged reverse SVs may also be of interest and could be included for sake of completeness.

A separate issue will be raised to incorporate test data reflecting the scenario into this project.

marykstewart commented 4 years ago

It seems like easier access to unmerged sequences would enable the less computationally facile members of the team (like me) to troubleshoot incidents like this ourselves https://gitlab.labmed.uw.edu/molmicro/NGS16S/-/issues/173.

dhoogest commented 3 years ago

May be convenient to address this along with #17

nhoffman commented 2 years ago

Implemented in 29d67cc37826213f1cd6252a2cad4ea58a8f13bf