How much faster than mosdepth?

brainstorm commented 3 years ago

Did you cargo bench this? I'm curious! :)

sstadick commented 3 years ago

I have not set up a rigorous enough set of benchmarks to really claim to be faster than mosdepth!

Anecdotally, the perbase only-depth and perbase only-depth -x commands are just a tad faster on datasets I have laying around and use almost the same algorithm as mosdepth. I believe the time difference would get bigger with larger samples and more cores since perbase spreads the work out better than mosdepth, but again, I don't have a solid benchmark datasets to back that up yet.

Differences that I know of between perbase only-depth and mosdepth:

perbase defaults to 1-based output, -z flag can make it 0-based, mosdepth defaults to 0 based
perbase has no default samflag filter, mosdepth defaults to the equivalent of -F 1796
perbase only-depth, in both normal and mate detection mode, perbase will count deletions toward depth, which I believe is the more correct thing to do. mosdepth does not count deletions toward depth.
perbase only-depth as an artifact of the parallelization, will sometimes not merge regions of the same depth that run up to the ends of the chunks handed out for paralleization. You can pipe the output output into perbase merge-adjacent if this is not okay.
mosdepth writes a gzipped output by default and perbase doesn't have that option yet. This also adds some noise to any benchmarking.

As an aside, if you are familiar with the mosdepth project, my benchmarking efforts ended when I failed at finding the data used to generate the mosdepth benchmarks (not due to the authors fault, just my own inability to navigate navigate NCBI SRA). ERR1395576 from the supplemental materials of the mosdepth publication.

Anyways, long answer to your short question! Thanks for working on the htslib mac OSX stuff!

sstadick commented 3 years ago

Linking these for future reference: https://github.com/sstadick/perbase/issues/31

brainstorm commented 3 years ago

Woah, thanks for the details, looking forward to those benches ;)

sstadick / perbase

How much faster than mosdepth? #30