pezmaster31 / bamtools

C++ API & command-line toolkit for working with BAM data
MIT License
418 stars 153 forks source link

bamtools random reads are highly skewed towards small chromosomes #230

Open benbfly opened 1 year ago

benbfly commented 1 year ago

I believe the way bamtools random works is to first randomly pick a reference, and then randomly pick a position within the reference.

For instance, in human GRCh38 , there are lots of tiny reference chromosomes, like chrUn**** and chr1****_random. Bamtools random will pick a large fraction of reads from these chromosomes, even if they are proportionally at tiny fraction of all reads. This is undesirable behavior, especially since these chromosomes are abnormal sequence which is mostly high copy and low complexity repeats.