Closed brainstorm closed 3 years ago
I have not set up a rigorous enough set of benchmarks to really claim to be faster than mosdepth
!
Anecdotally, the perbase only-depth
and perbase only-depth -x
commands are just a tad faster on datasets I have laying around and use almost the same algorithm as mosdepth
. I believe the time difference would get bigger with larger samples and more cores since perbase
spreads the work out better than mosdepth
, but again, I don't have a solid benchmark datasets to back that up yet.
Differences that I know of between perbase only-depth
and mosdepth
:
perbase
defaults to 1-based output, -z
flag can make it 0-based, mosdepth defaults to 0 basedperbase
has no default samflag filter, mosdepth
defaults to the equivalent of -F 1796
perbase only-depth
, in both normal and mate detection mode, perbase will count deletions toward depth, which I believe is the more correct thing to do. mosdepth
does not count deletions toward depth.perbase only-depth
as an artifact of the parallelization, will sometimes not merge regions of the same depth that run up to the ends of the chunks handed out for paralleization. You can pipe the output output into perbase merge-adjacent
if this is not okay.mosdepth
writes a gzipped output by default and perbase
doesn't have that option yet. This also adds some noise to any benchmarking. As an aside, if you are familiar with the mosdepth
project, my benchmarking efforts ended when I failed at finding the data used to generate the mosdepth
benchmarks (not due to the authors fault, just my own inability to navigate navigate NCBI SRA). ERR1395576 from the supplemental materials of the mosdepth
publication.
Anyways, long answer to your short question! Thanks for working on the htslib mac OSX stuff!
Linking these for future reference: https://github.com/sstadick/perbase/issues/31
Woah, thanks for the details, looking forward to those benches ;)
Did you
cargo bench
this? I'm curious! :)