varfish-org / mehari

VEP-like tool for sequence ontology and HGVS annotation of VCF files
MIT License
14 stars 1 forks source link

fix: Performance improvements #444

Closed tedil closed 3 weeks ago

tedil commented 1 month ago

I've also done a little bit of refactoring: the run_with_writer function now mostly has the DB setup and loops over the VCF records, while the actual annotation is done in a different function etc. Makes for easier to read profiling results.

tedil commented 1 month ago

Currently, hvgs-rs and seqrepo-rs point to local project instances with some local changes. Now references specific git commits. The main issue with enabling caching for assembly::Mapper::new in hgvs-rs was that the error type was not clone-able -- deriving Clone was easy for most error variants, apart from postgres, whose errors I simply wrapped in Arc, which is very annoying, but it works :shrug:

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 84.07258% with 158 lines in your changes missing coverage. Please review.

Project coverage is 75.59%. Comparing base (af8b11e) to head (36f1689).

Files Patch % Lines
src/annotate/seqvars/mod.rs 79.30% 89 Missing :warning:
src/common/noodles.rs 68.91% 23 Missing :warning:
src/common/mod.rs 75.75% 16 Missing :warning:
src/annotate/strucvars/mod.rs 95.85% 14 Missing :warning:
src/common/io/tokio.rs 60.00% 8 Missing :warning:
src/main.rs 0.00% 4 Missing :warning:
src/annotate/mod.rs 91.66% 2 Missing :warning:
src/verify/seqvars.rs 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #444 +/- ## ========================================== + Coverage 75.55% 75.59% +0.04% ========================================== Files 23 24 +1 Lines 8166 8561 +395 ========================================== + Hits 6170 6472 +302 - Misses 1996 2089 +93 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

xiamaz commented 1 month ago

Thanks! Could you fix the docker build? Somehow annotations seems to fail to build:

#27 0.097   can't find `annotation` bench at `benches/annotation.rs` or `benches/annotation/main.rs`. Please specify bench.path if you want to use a non-default path.
tedil commented 1 month ago

I've replaced the noodles-$project dependencies with the corresponding noodles group dependency, setting its features to the list of noodles-$projects that were used. That way, there's only one noodles version to bump and, more importantly, the projects are compatible. This is especially relevant for vcf and bcf, as this ensures that the vcf::Header is the same across both, which allows us to create unified vcf/bcf readers/writers (i.e. mehari can read BCF now). We can move the relevant commits to a different branch, but for now it was more convenient (to me) to have them here.

tedil commented 1 month ago

Perhaps we can also take a look at https://github.com/apps/codspeed-hq (and https://github.com/CodSpeedHQ/codspeed-rust) to ensure performance doesn't degrade

tedil commented 1 month ago

[…] which allows us to create unified vcf/bcf readers/writers (i.e. mehari can read BCF now).

Once https://github.com/zaeleus/noodles/issues/266 is resolved, we can also write BCF (correctly). I don't think it will have much impact on performance, though, but it's nice to have, anyways.

tedil commented 1 month ago

Should we enable lto = "fat" and codegen-units = 1 (or some other small value) by default for the release profile (increases build times quite a bit; on the other hand, most of the build time is spent on the rocksdb anyways)? Currently, these are only set for the perf profile (such that the effect the settings have can be compared more easily)