nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
810 stars 673 forks source link

Explicit subsampling step in rnaseq pipeline #1097

Open ewallace opened 8 months ago

ewallace commented 8 months ago

Description of feature

Subsampling seq data before running a pipeline is good practice to test configurations and fail fast. Allowing the user to subsample the input data before running the entire pipeline, would provide a quicker in-line way to validate that the pipeline runs, troubleshoot, and check inputs.

I would like to request optional subsampling as a feature, I think it will save a lot of people a lot of time. Yes, it's possible for users to manually subsample data and then feed that in to the pipeline, but that seems to be against the nextflow spirit. Having this option inline would let users test-run the pipeline with --subsample-reads 100000 then test everything within minutes, followed by editing that one parameter to run on all the input data.

Probably it's achievable with fq subsample.

Note that the current (v3.12.0) "subsample" step does not do that, see issue #1095.

Issue #1096 suggests a different workaround only if using FastP for alignment.

drpatelh commented 1 month ago

Could be solved by #1096