Closed heikoklein closed 3 months ago
Attention: Patch coverage is 73.33333%
with 4 lines
in your changes missing coverage. Please review.
Project coverage is 79.30%. Comparing base (
9f4b8dc
) to head (4bf3e26
). Report is 486 commits behind head on main-dev.
Files with missing lines | Patch % | Lines |
---|---|---|
pyaerocom/io/ebas_file_index.py | 62.50% | 3 Missing :warning: |
pyaerocom/io/cams2_83/reader.py | 0.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@charlienegri : I add here again the time-chunking with chunks={"time": 24}
. We tried that with CAMS2_83 and time=1 and while it gave good improvements on memory-consumption, it took slightly longer. With time=24, I couldn't see any performance degration, and the memory reduction of daily hourly data vs yearly hourly data is still huge. So, you might want to try that approach again?
@charlienegri : I add here again the time-chunking with
chunks={"time": 24}
. We tried that with CAMS2_83 and time=1 and while it gave good improvements on memory-consumption, it took slightly longer. With time=24, I couldn't see any performance degration, and the memory reduction of daily hourly data vs yearly hourly data is still huge. So, you might want to try that approach again?
I have installed the branch in the test module and will try a long run with it
if it gets merged it will anyway be deployed to production in july as part of a new version of the module
@charlienegri : I add here again the time-chunking with
chunks={"time": 24}
. We tried that with CAMS2_83 and time=1 and while it gave good improvements on memory-consumption, it took slightly longer. With time=24, I couldn't see any performance degration, and the memory reduction of daily hourly data vs yearly hourly data is still huge. So, you might want to try that approach again?I have installed the branch in the test module and will try a long run with it if it gets merged it will anyway be deployed to production in july as part of a new version of the module
test run crashed immediately with
Loading cams2_83-evaluation/test
Loading requirement: proj/9.1.0
Traceback (most recent call last):
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/bin/cams2_83", line 5, in <module>
from pyaerocom.scripts.cams2_83.cli import app
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/pyaerocom/__init__.py", line 9, in <module>
from .config import Config
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/pyaerocom/config.py", line 19, in <module>
from pyaerocom.grid_io import GridIO
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/pyaerocom/grid_io.py", line 2, in <module>
from pyaerocom.time_config import TS_TYPES
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/pyaerocom/time_config.py", line 7, in <module>
from iris import coord_categorisation
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/iris/coord_categorisation.py", line 23, in <module>
import iris.coords
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/iris/coords.py", line 23, in <module>
from iris.common import (
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/iris/common/__init__.py", line 9, in <module>
from .mixin import *
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/iris/common/mixin.py", line 10, in <module>
import cf_units
File "/modules/rhel8/user-apps/fou-modules/cams2_83-evaluation/test/venv/lib/python3.10/site-packages/cf_units/__init__.py", line 23, in <module>
from cf_units import _udunits2 as _ud
File "cf_units/_udunits2.pyx", line 1, in init cf_units._udunits2
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
I will try a fresh venv
same result in a fresh venv
same result in a fresh venv
The numpy/environment error might be related to #1206
Rather than asking you to run this branch (which improves emep-reporting mem-usage), I wanted to ask you about testing the same idea in the cams2_83 reader, i.e. add chunks={"time": 24}
to https://github.com/metno/pyaerocom/blob/26e8cdb79e7e0a99c13e370da51336cbbea44431/pyaerocom/io/cams2_83/reader.py#L193
same result in a fresh venv
The numpy/environment error might be related to #1206
Rather than asking you to run this branch (which improves emep-reporting mem-usage), I wanted to ask you about testing the same idea in the cams2_83 reader, i.e. add
chunks={"time": 24}
to
I see, I will try that instead
the issue I had is the same that you mentioned
same result in a fresh venv
The numpy/environment error might be related to #1206 Rather than asking you to run this branch (which improves emep-reporting mem-usage), I wanted to ask you about testing the same idea in the cams2_83 reader, i.e. add
chunks={"time": 24}
to https://github.com/metno/pyaerocom/blob/26e8cdb79e7e0a99c13e370da51336cbbea44431/pyaerocom/io/cams2_83/reader.py#L193I see, I will try that instead the issue I had is the same that you mentioned
the test with chunks={"time": 24}
in the cams283's read_dataset
used memory comparable with the production code
the running time was significantly shorter in a way that I am not sure we can attribute to this change only at this stage.. or maybe it's the perfect fit
anyway I think it can be safely implemented
@charlienegri Thanks, I added then the "chunk" line to the cams2_83/reader.py, too.
Change Summary
This PR fixes some memory issues discovered during work with emep-reporting.
This PR adds also a new requirement:
psutil
Related issue number
The remaining issue is now reading eea-data which will keep all existing data per variable in memory, leading to ~55GB peak-memory usage, though during processing only 1-5G data are needed. This will be addressed in https://github.com/metno/pyaro-readers/issues/43
Checklist