sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

What should we benchmark? #2410

Open snair09 opened 1 year ago

snair09 commented 1 year ago

@ccbaumler and I have developed a benchmarking workflow to analyze various different metrics of sourmash commands. This will help us identify parts of the program which can be further optimized. We have tested workflow using the gather command, but it can be easily applied to other commands.

The input sig files vary in sample location, instrumentation, base quantity, and file size. They are listed below: Sample Sampling Location Instrument Base Quantity (gigabases) Files size (gigabytes)
SRR1976948 USA: Alaska, North Slope, Schrader Bluff formation Illumina MiSeq 8.65 4.96
SRR1977249 USA: Alaska, North Slope, Schrader Bluff formation Illumina MiSeq 9.61 5.65
SRR1977296 USA: Alaska, North Slope, Ivishak formation Illumina HiSeq 2500 15.49 8.47
SRR1977304 USA: Alaska, North Slope, Ivishak formation Illumina HiSeq 2500 14.50 8.45
SRR1977357 USA: Alaska, North Slope, Kuparuk formation Illumina HiSeq 2500 19.00 11.10
SRR1977365 USA: Alaska, North Slope, Kuparuk formation Illumina HiSeq 2500 18.48 10.82

(Metadata found here and Project information here)

Metrics are measured two different ways:

  1. Flamegraphs constructed using python profiler (py-spy). An example of a flamegraph for the gather command: image

  2. Furthermore, the workflow measures Computational metrics by snakemake's benchmark directive. The benchmark outputs a tsv file with the following metrics ref :

colname type (unit) description
s float (seconds) Running time in seconds
h :m :s string (-) Running time in hour, minutes, seconds format
max_rss float (MB) Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms float (MB) Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss float (MB) “Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss float (MB) “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in float (-) the number of read operations performed (cumulative).
io_out float (-) the number of write operations performed (cumulative).
mean_load float (-) CPU usage over time, divided by the total running time (first row)
cpu_time float(-) CPU time summed for user and system

We can compare the metrics among different samples with line graphs (example below) to check for inconsistencies.

image

Possible additions to the benchmarking work flow:

  1. Running the command on each sample multiple times of an arbitrary standard (say 3 times) and aggregating these results. This will prevent one time events and outliers from influencing our analysis.
  2. Use the perf command to gather more robust metrics. See the flamegraph off-shoots here
ctb commented 1 year ago

Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)

luizirber commented 1 year ago

Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)

run py-spy --native to go into the Rust level: https://github.com/benfred/py-spy#can-py-spy-profile-native-extensions

ctb commented 1 year ago

Some thoughts on the larger question:

First, there are a lot of other bits and bobs of sourmash outside of sourmash prefetch and sourmash gather that don't often get a lot of attention. I'm thinking particularly of the stuff under sourmash sig, where we already have one well-known performance problem - https://github.com/sourmash-bio/sourmash/issues/1617. Perhaps there are more?

Second, the Python code has not always received a lot of optimization love. This has bitten us in a few places e.g. https://github.com/sourmash-bio/sourmash/pull/2132.

So my hot take is it might be good to set up some baseline benchmarks for parts of the code other than the usual suspects (gather, prefetch, search, compare) to see if there are any surprises. I'm not quite sure how to go about this, though. I do keep on coming back to the sourmash sig submodule tho...

ccbaumler commented 1 year ago

Random Q - is there a way to push the benchmarking down to the Rust level? (py-spy can't do that, I don't think, but maybe other programs can.)

run py-spy --native to go into the Rust level: https://github.com/benfred/py-spy#can-py-spy-profile-native-extensions

Thanks for the tip @luizirber! It was easy to implement. Below is the comparison of without --native and with --native. If you click the image, it will bring you to the html-rendered svg and allow you to see the details of the flamegraph layers.

Without --native

Alt text

With --native

@ctb Thanks for the direction! We'll start with the signature sub-cmds.

luizirber commented 1 year ago

https://github.com/sourmash-bio/resources has some benchmarks too, feel free to take whatever is useful :sweat_smile:

ctb commented 1 year ago

(the py-spy --native graphs are super cool!)

ccbaumler commented 1 year ago

We should add the repeat("benchmarks/somecommand/{sample}.tsv", 3) to simply replicates!

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#benchmark-rules

While perf and flamegraphs are sampling in the sub-secone timescale, it should be noted that snakemake is benchmarking every 30 seconds. Something to keep in mind. There's a suggested workaround here. https://github.com/snakemake/snakemake/issues/851