legend-exp / pygama

Python package for data processing and analysis
https://pygama.readthedocs.io
GNU General Public License v3.0
18 stars 56 forks source link

Updated pargen.utils.load_data to use LH5Iterator and field_mask to be more memory efficient #589

Open iguinn opened 2 months ago

iguinn commented 2 months ago

load_data was using too much memory (at least in P08 which is a larger dataset), and crashing my attempts to process it on NERSC. Switch it to use the fieldmask when reading from the input files and the LH5Iterator to limit the number of entries in memory at once. Also improved commenting/docstring.

Note this should not be merged until this is also merged: https://github.com/legend-exp/legend-pydataobj/pull/100

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 29 lines in your changes missing coverage. Please review.

Project coverage is 48.91%. Comparing base (981877e) to head (ee73d2a).

Files Patch % Lines
src/pygama/pargen/utils.py 0.00% 29 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #589 +/- ## ========================================== + Coverage 48.80% 48.91% +0.11% ========================================== Files 59 59 Lines 7846 7821 -25 ========================================== - Hits 3829 3826 -3 + Misses 4017 3995 -22 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gipert commented 1 week ago

Is this tested @ggmarshall? @iguinn can you bump the pydataobj version in pyproject.toml, if this is not backward compatible?

ggmarshall commented 1 week ago

Not on my end but I can have a look later this week