fix: Performance improvements

tedil commented 1 month ago

use noodles async vcf reader & writer
cache some calls in hgvs-rs (see yet-to-be-created pull request in hgvs-rs)
add dependencies and do setup for DHAT, criterion, flamegraph, coz, pprof
add profiling and perf target profiles

I've also done a little bit of refactoring: the run_with_writer function now mostly has the DB setup and loops over the VCF records, while the actual annotation is done in a different function etc. Makes for easier to read profiling results.

tedil commented 1 month ago

~~Currently, hvgs-rs and seqrepo-rs point to local project instances with some local changes.~~ Now references specific git commits. The main issue with enabling caching for assembly::Mapper::new in hgvs-rs was that the error type was not clone-able -- deriving Clone was easy for most error variants, apart from postgres, whose errors I simply wrapped in Arc, which is very annoying, but it works :shrug:

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 84.07258% with 158 lines in your changes missing coverage. Please review.

Project coverage is 75.59%. Comparing base (af8b11e) to head (36f1689).

Files	Patch %	Lines
src/annotate/seqvars/mod.rs	79.30%	89 Missing :warning:
src/common/noodles.rs	68.91%	23 Missing :warning:
src/common/mod.rs	75.75%	16 Missing :warning:
src/annotate/strucvars/mod.rs	95.85%	14 Missing :warning:
src/common/io/tokio.rs	60.00%	8 Missing :warning:
src/main.rs	0.00%	4 Missing :warning:
src/annotate/mod.rs	91.66%	2 Missing :warning:
src/verify/seqvars.rs	0.00%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #444 +/- ## ========================================== + Coverage 75.55% 75.59% +0.04% ========================================== Files 23 24 +1 Lines 8166 8561 +395 ========================================== + Hits 6170 6472 +302 - Misses 1996 2089 +93 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

xiamaz commented 1 month ago

Thanks! Could you fix the docker build? Somehow annotations seems to fail to build:

#27 0.097   can't find `annotation` bench at `benches/annotation.rs` or `benches/annotation/main.rs`. Please specify bench.path if you want to use a non-default path.

tedil commented 1 month ago

I've replaced the noodles-$project dependencies with the corresponding noodles group dependency, setting its features to the list of noodles-$projects that were used. That way, there's only one noodles version to bump and, more importantly, the projects are compatible. This is especially relevant for vcf and bcf, as this ensures that the vcf::Header is the same across both, which allows us to create unified vcf/bcf readers/writers (i.e. mehari can read BCF now). We can move the relevant commits to a different branch, but for now it was more convenient (to me) to have them here.

tedil commented 1 month ago

Perhaps we can also take a look at https://github.com/apps/codspeed-hq (and https://github.com/CodSpeedHQ/codspeed-rust) to ensure performance doesn't degrade

tedil commented 1 month ago

[…] which allows us to create unified vcf/bcf readers/writers (i.e. mehari can read BCF now).

Once https://github.com/zaeleus/noodles/issues/266 is resolved, we can also write BCF (correctly). I don't think it will have much impact on performance, though, but it's nice to have, anyways.

tedil commented 1 month ago

Should we enable lto = "fat" and codegen-units = 1 (or some other small value) by default for the release profile (increases build times quite a bit; on the other hand, most of the build time is spent on the rocksdb anyways)? Currently, these are only set for the perf profile (such that the effect the settings have can be compared more easily)

varfish-org / mehari

fix: Performance improvements #444

Codecov Report