malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Improve G123 and H12 performance #493

Closed alimanfoo closed 7 months ago

alimanfoo commented 7 months ago

Here we speed up G123 and H12 computations by using a faster approach to hashing.

Also a very minor broadening of the API chunks parameter, to allow exploration of whether different approaches to reading chunks of zarr data improves performance, although that actually doesn't seem to offer any performance benefits after all.

review-notebook-app[bot] commented 7 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

codecov[bot] commented 7 months ago

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (7995a0f) 98.49% compared to head (9967d13) 98.59%. Report is 1 commits behind head on master.

Files Patch % Lines
malariagen_data/anoph/snp_frq.py 98.42% 7 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #493 +/- ## ========================================== + Coverage 98.49% 98.59% +0.10% ========================================== Files 30 31 +1 Lines 2520 2996 +476 ========================================== + Hits 2482 2954 +472 - Misses 38 42 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alimanfoo commented 7 months ago

N.b. G123 is still relatively slow as most of the time is in retrieval of genotype data, but this does still make a difference.