Reinstate get_unmerged.R logic

nhoffman / dada2-nf

A Nextflow pipeline for processing 16S rRNA sequences using dada2

0 stars 2 forks source link

Reinstate get_unmerged.R logic #84

Closed dhoogest closed 8 months ago

dhoogest commented 8 months ago

@crosenth @nhoffman I don't recall if we'd made a specific decision to drop the use of https://github.com/nhoffman/dada2-nf/blob/master/bin/get_unmerged.R as a workflow step (possibly we just overlooked since merging wasn't a focus of ITS work?), but it's come up in the context of https://gitlab.labmed.uw.edu/molmicro/NGS16S/-/issues/342#note_113382. Is there any reason not to just add a step following dada2_dada.R to consume the dada.rds output and generate per-sample unmerged_F/R.fasta files?

crosenth commented 8 months ago

https://github.com/nhoffman/dada2-nf/commit/61c0772b60239100c342f2d4bca8f8506d51e4a6

I believe it was removed with the creation of the output/[R1, R2] dirs which should contain the unmerged reads

dhoogest commented 8 months ago

Yeah, so the 'unmerged' reads could be inferred as:

seqs in /R1/seqs.fasta not present in /seqs.fasta -or-
seqs /R2/seqs.fasta ?

crosenth commented 8 months ago

Yea or just the seqnames

dhoogest commented 8 months ago

cool closing

crosenth commented 8 months ago

Happy to revive the get_unmerged.R script or add annotation somewhere. Whatever makes things easiest for the reporting process

dhoogest commented 8 months ago

I think we can add it on the 'reporting' side of the pipeline assuming we've got all of the necessary info in the outputs. At a glance, I think the seqnames might not do the trick, since each merged/R1/R2 sv list is independently enumerated (if I'm not mistaken).

crosenth commented 8 months ago

I think the seqtab.csv files have the original seqnames??

dhoogest commented 8 months ago

Ah gotcha, so like dada2-nf/dada/{sampleid}/{orientation}/seqtab*.csv? I was looking in dada2-nf/R1/sv_table.csv etc.

Not sure that'll do the trick either, all I see in the seqtab headers are:

sampleid, weight, seq

Might be easiest to lean on the bin/get_unmerged.R script within this pipeline afterall...

crosenth commented 8 months ago

https://github.com/nhoffman/dada2-nf/commit/8dd4ba8a2d07d9a13ff482ea7b6b709066d64c11

dhoogest commented 8 months ago

Looks good to me (no need to change the workflow logic - nice). Tag forthcoming? /cc @nhoffman

crosenth commented 8 months ago

2.0.2 - https://github.com/nhoffman/dada2-nf/pkgs/container/dada2-nf