varfish-org / hgvs-rs

A port of biocommons/hgvs to the Rust programming language
Apache License 2.0
11 stars 4 forks source link

Further tune translate_cds code #83

Closed holtgrewe closed 1 year ago

holtgrewe commented 1 year ago

A lot of time is apparently spent in accessing the lazy_static data structures through the lock.

holtgrewe commented 1 year ago

This reduces running time by another ~2/3.

image

holtgrewe commented 1 year ago

The flamegraph tells us that the code is now mostly in the table lookup. Looking at the annotated source code via perf report (see below) shows that nothing stands out. The red 5.54% correspond to the u8 as usize conversion that cannot be really helped, I guess.

image

For the record, here is how to use ad-hoc pprof-rs for generating flamegraphs. This is the most reliable way of getting flamegraphs (cargo flamegraph somehow did not work very well) which also allows to create flamegraphs for selected parts of the program only.

diff --git a/Cargo.toml b/Cargo.toml
index 0035412..26196fa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -39,7 +39,11 @@ pretty_assertions = "1.3.0"
 rstest = "0.17.0"
 test-log = "0.2.11"
 criterion = "0.3"
+pprof = { version = "0.11.1", features = ["flamegraph", "cpp"] }

 [[bench]]
 name = "translate_cds"
 harness = false
+
+[profile.release]
+debug = true
diff --git a/benches/translate_cds.rs b/benches/translate_cds.rs
index b1c9175..8899687 100644
--- a/benches/translate_cds.rs
+++ b/benches/translate_cds.rs
@@ -18,9 +18,14 @@ lazy_static::lazy_static! {
 }

 fn criterion_benchmark(c: &mut Criterion) {
+    let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).blocklist(&["libc", "libgcc", "pthread", "vdso"]).build().unwrap();
     c.bench_function("translate_cds TTN", |b| {
         b.iter(|| translate_cds(&SEQ_TTN, true, "*", TranslationTable::Standard).unwrap())
     });
+    if let Ok(report) = guard.report().build() {
+        let file = std::fs::File::create("flamegraph.svg").unwrap();
+        report.flamegraph(file).unwrap();
+    };
 }

 criterion_group!(benches, criterion_benchmark);
holtgrewe commented 1 year ago

Annotating 100k variants in mehari goes down from 24.15s to 15.5s (end-to-end running time).