mbhall88 / rasusa

Randomly subsample sequencing reads or alignments
https://doi.org/10.21105/joss.03941
MIT License
203 stars 17 forks source link

Random sampling based on bases for the metagenomic dataset #59

Closed AnupamGautam closed 1 year ago

AnupamGautam commented 1 year ago

Dear Developers,

I am working on a comparative study, for which we are using short-read and long-reads metagenomic samples. We want to subsample our long reads dataset based on bases, For Eg, if our randomly sub-sampled short read metagenomic sample has 10,000 reads which make up 1,500,000 bases in total, we want to subsample our long read dataset to 1,500,000 bases. For short read, we used reformat.sh (to randomly subsample reads).

I searched tools for subsampling long reads to achieve the required bases. but none of them use random sampling, I might have missed some tools which can do it.

My question is will it be possible to do it by using Rasusa for the metagenomic dataset?

Thanks, Anupam

mbhall88 commented 1 year ago

Yes, you can subsample to a given number of bases. Please see the docs section describing this option.