nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
911 stars 705 forks source link

Ability to skip both alignment and pseudo-alignment to only run pre-processing QC steps. #1018

Closed biopaw closed 1 year ago

biopaw commented 1 year ago

Description of feature

Use Case:

It appears that at the moment the workflow must necessarily perform alignment, if --skip_alignment is true, then --pseudo_aligner must be populated with salmon .

The most common use case for rna-seq count generation, especially for larger sets of samples, is (1) to perform QC run first, (2) evaluate the quality control reports, (3) adjust sample manifest and/or pass additional trimming parameters (4) run a final count generation run (with qc,for final qc reports). Running alignment prior to quality control assessment, may waste a lot of time and resources, as then 2 complete runs of the workflow end to end would need to be performed.

Enhancement:

A very simple enhancement, would be to add a flag for skipping pseudoalignment, so that together adding:

--skip_alignment true and --skip_pseudoalignment true

will make sure only the quality control steps that have been specified are getting completed. If I was a little further along with working with the nextflow development, I would offer to help with this now; it is preferable for someone more experienced to this for now.

ETA

Can someone add this wee feature relatively soon, as it is needed for dealing with a several large datasets I need to process.

georgiesamaha commented 1 year ago

+1 Completely agree, this would be really valuable. The inability to perform raw data QC as you normally would is something that holds my group back from using the nf-core/rnaseq pipeline when working at scale.

See https://github.com/nf-core/rnaseq/issues/1015

davidecarlson commented 1 year ago

I love this idea. I asked a question about this in the nf-core slack and it was pointed out that the taxprofiler pipeline can do QC only without any alignment, but it would be really nice to have this functionality in the rnaseq pipeline as well.