hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

Feature request: subsample alignments #26

Closed tomsing1 closed 7 years ago

tomsing1 commented 8 years ago

QoRTs is great, but I find performing QC on a full BAM file takes quite some time, exceeding that of any other steps in my workflow. Usually, I subsample from the bam file (e.g. using samtools) and perform QC on the sample only.

Would the option to subsample either a fixed number of reads or a fraction of the reads (keeping pairs intact) be a useful addition to QoRTs?

hartleys commented 7 years ago

Hmm. Should be an easy addition.

I'm currently in the process of a major upgrade to QoRTs to add an array of new QC metrics (v1.2.0). It may be some time before the new version is fully tested and ready for release, but I'll make sure to include this feature.

hartleys commented 7 years ago

You can now tell QoRTs to subsample a certain percentage of the reads using the --randomSubsample X parameter. X is a floating-point number between 0 and 1 (default is 1).

You can also specify the random seed using the parameter "--randomSeed". This allows you to rerun the analysis and get the exact same result.

tomsing1 commented 7 years ago

Great, thanks a lot!