Closed holtgrewe closed 1 year ago
This reduces running time by another ~2/3.
The flamegraph tells us that the code is now mostly in the table lookup. Looking at the annotated source code via perf report
(see below) shows that nothing stands out. The red 5.54%
correspond to the u8 as usize
conversion that cannot be really helped, I guess.
For the record, here is how to use ad-hoc pprof-rs
for generating flamegraphs. This is the most reliable way of getting flamegraphs (cargo flamegraph
somehow did not work very well) which also allows to create flamegraphs for selected parts of the program only.
diff --git a/Cargo.toml b/Cargo.toml
index 0035412..26196fa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -39,7 +39,11 @@ pretty_assertions = "1.3.0"
rstest = "0.17.0"
test-log = "0.2.11"
criterion = "0.3"
+pprof = { version = "0.11.1", features = ["flamegraph", "cpp"] }
[[bench]]
name = "translate_cds"
harness = false
+
+[profile.release]
+debug = true
diff --git a/benches/translate_cds.rs b/benches/translate_cds.rs
index b1c9175..8899687 100644
--- a/benches/translate_cds.rs
+++ b/benches/translate_cds.rs
@@ -18,9 +18,14 @@ lazy_static::lazy_static! {
}
fn criterion_benchmark(c: &mut Criterion) {
+ let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).blocklist(&["libc", "libgcc", "pthread", "vdso"]).build().unwrap();
c.bench_function("translate_cds TTN", |b| {
b.iter(|| translate_cds(&SEQ_TTN, true, "*", TranslationTable::Standard).unwrap())
});
+ if let Ok(report) = guard.report().build() {
+ let file = std::fs::File::create("flamegraph.svg").unwrap();
+ report.flamegraph(file).unwrap();
+ };
}
criterion_group!(benches, criterion_benchmark);
Annotating 100k variants in mehari goes down from 24.15s to 15.5s (end-to-end running time).
A lot of time is apparently spent in accessing the
lazy_static
data structures through the lock.