Open snair09 opened 1 year ago
Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)
Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)
run py-spy --native
to go into the Rust level: https://github.com/benfred/py-spy#can-py-spy-profile-native-extensions
Some thoughts on the larger question:
First, there are a lot of other bits and bobs of sourmash outside of sourmash prefetch
and sourmash gather
that don't often get a lot of attention. I'm thinking particularly of the stuff under sourmash sig
, where we already have one well-known performance problem - https://github.com/sourmash-bio/sourmash/issues/1617. Perhaps there are more?
Second, the Python code has not always received a lot of optimization love. This has bitten us in a few places e.g. https://github.com/sourmash-bio/sourmash/pull/2132.
So my hot take is it might be good to set up some baseline benchmarks for parts of the code other than the usual suspects (gather, prefetch, search, compare) to see if there are any surprises. I'm not quite sure how to go about this, though. I do keep on coming back to the sourmash sig
submodule tho...
Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)
run py-spy --native to go into the Rust level: https://github.com/benfred/py-spy#can-py-spy-profile-native-extensions
Thanks for the tip @luizirber! It was easy to implement. Below is the comparison of without --native
and with --native
. If you click the image, it will bring you to the html-rendered svg and allow you to see the details of the flamegraph layers.
--native
--native
@ctb Thanks for the direction! We'll start with the signature sub-cmds.
https://github.com/sourmash-bio/resources has some benchmarks too, feel free to take whatever is useful :sweat_smile:
(the py-spy --native
graphs are super cool!)
We should add the repeat("benchmarks/somecommand/{sample}.tsv", 3)
to simply replicates!
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#benchmark-rules
While perf and flamegraphs are sampling in the sub-secone timescale, it should be noted that snakemake is benchmarking every 30 seconds. Something to keep in mind. There's a suggested workaround here. https://github.com/snakemake/snakemake/issues/851
@ccbaumler and I have developed a benchmarking workflow to analyze various different metrics of sourmash commands. This will help us identify parts of the program which can be further optimized. We have tested workflow using the gather command, but it can be easily applied to other commands.
(Metadata found here and Project information here)
Metrics are measured two different ways:
Flamegraphs constructed using python profiler (py-spy). An example of a flamegraph for the gather command:
Furthermore, the workflow measures Computational metrics by snakemake's benchmark directive. The benchmark outputs a tsv file with the following metrics ref :
We can compare the metrics among different samples with line graphs (example below) to check for inconsistencies.
Possible additions to the benchmarking work flow: