mbhall88 / rasusa

Randomly subsample sequencing reads or alignments
https://doi.org/10.21105/joss.03941
MIT License
203 stars 17 forks source link

Input parameter for number of bases in addition to coverage and genome size #30

Closed tomazberisa closed 3 years ago

tomazberisa commented 3 years ago

In addition to the ability of providing --coverage and --genome-size, an alternative usage mode where the user provides the total number of bases in downsampled out (e.g., --bases) would also be useful in certain use cases.

tomazberisa commented 3 years ago
$ rasusa ...
...
[2021-08-19][16:23:02][rasusa][INFO] Target number of bases to subsample to is: <value>
...

To clarify, the idea is to provide <value> from example output above ^ directly via a command-line parameter instead of calculating it from coverage and genome size.

mbhall88 commented 3 years ago

would also be useful in certain use cases.

Which uses cases would you find this useful for?

tomazberisa commented 3 years ago

One example is a FASTQ file that contains sequencing reads from more than one species. In this case the coverage + genome size input isn’t directly applicable to the contents of the file.

mbhall88 commented 3 years ago

I see. Fair enough. I can see the utility of such an option.

tomazberisa commented 3 years ago

Amazing, thank you for #32 and v0.6.0!

mbhall88 commented 3 years ago

Thanks for the enhancement suggestions!