Clarify infer strandedness from current subsampling + infer step

nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.

https://nf-co.re/rnaseq

MIT License

874 stars 696 forks source link

Clarify infer strandedness from current subsampling + infer step #1095

Closed ewallace closed 3 months ago

ewallace commented 11 months ago

Description of feature

The current nf-core/rnaseq (3.12.0) has initial steps to infer strandedness by first subsample fq, then Salmon to infer strandedness. This is an optional step and has led to some confusion as it's not actually subsampling all the reads.

On a nf-core slack discussion, @drpatelh suggested:

Maybe subsample + infer need to be part of the same station. I think we chose to do it this way because it would have meant introducing more lines and curves to the map which would make it even more confusing. Can you create an issue for this please.

The suggestion is to combine into one station / one module or workflow step. That would clean up the metro diagram and avoid the confusion.

This could be called

"Infer strandedness (fq, Salmon)" in the metro diagram
"Auto-infer strandedness by subsampling and pseudoalignment (fq, Salmon)" in the list of steps.

MatthiasZepper commented 11 months ago

To even complicate matters, the most recent release of BBTools (39.03) now also contains a new tool to infer strandedness called checkstrand.sh.

I have not done any comprehensive evaluation, but it has a samplerate=1.0 parameter and can also stop preliminarily after a fixed number of reads reads=-1. Since it is a one-stop shop written by a reputable author, I believe, chances are that it is way faster than the current subworkflow?

pinin4fjords commented 3 months ago

@ewallace - does https://github.com/nf-core/rnaseq/pull/1307 fix things for you?

ewallace commented 3 months ago

@pinin4fjords thanks, yes, that looks ideal! Very clear.

ewallace commented 3 months ago

The new subway map is labeled (Salmon, fq) - I agree that the Salmon is more important than the fq, but fq happens before subsampling, so you may wish to switch the order in which they are written on the subway map depending on your goals.

pinin4fjords commented 3 months ago

ping @maxulysse !

maxulysse commented 3 months ago

done in #1307

maxulysse commented 3 months ago

yeah, I saw the comment and modified my PR in accordance, and then said I've done it

pinin4fjords commented 3 months ago

yeah, I saw the comment and modified my PR in accordance, and then said I've done it

You were too fast for me, I didn't think you'd already have addressed the comment. All good now :-)