malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Slow performance calling snp_calls() with site_mask parameter #463

Closed alimanfoo closed 9 months ago

alimanfoo commented 9 months ago

On colab, calling this:

ag3.snp_calls(region="3L", site_mask="gamb_colu_arab")

...spends more than 1m 30s in applying site filters.

Suspect this is because the site mask data is being read once for each variable in the dataset.

Could be optimised to read only once.