The issue of running out of Java Heap Space, results when scenarios are loaded sequentially. A scenario is loaded, data is retrieved from the scenario (no checkout/commit is performed), data is then processed. (details below). A maximum of 12 scenarios can be processed, before memory issues are encountered. For each scenario which is loaded, more memory is required as opposed to memory being freed up when the routine for one scenario is completed, which would be available for the next scenario which is processed.
Code sample or context
I have made a small program (module_x) that uses some functions from the "old" reporting in message_data.
Module_x process (python):
imports message_ix/ixmp
load ixmp.Platform() -> assigned to variable "mp"
loads a scenario (without cache) -> assigned to variable "scen"
retrieves 3 dataframes using "old" reporting functions.
loads 1 variable stored as a timeseries with the scenarios object.
performs a simple data manipulation.
close mp
set variable "scen" = None
set variable "mp" = None
Module_x is called from an ipynb (>160 scenarios). The jupyter notebook loops over the scenario list and does nothing else except loop over scenario names and for each scenario
import module_x
call module_x passing some variables (all strings)
del module_x
Problem description
Memory issues occur after approx. 10-12 scenarios.
Versions
ixmp: 3.2.1.dev80+g40fc589
40fc589 (HEAD -> master, origin/master, origin/HEAD) Merge branch 'master' of https://github.com/iiasa/ixmp
message_ix: 3.2.1.dev67+ga20ffb0
a20ffb0 (HEAD -> master, origin/master, origin/HEAD) Merge branch 'master' of https://github.com/iiasa/message_ix
message_data: installed
3bfb76b (HEAD -> RES_add_5_year_timesteps2, origin/RES_add_5_year_timesteps2) added configuration files for ENGAGE submission 20210331 (ENGAGE 4.1.7)
click: 7.1.2
dask: 2020.12.0
graphviz: 0.13.2
jpype: 1.2.1
… JVM path: C:\Program Files\Java\jre1.8.0_231\bin\server\jvm.dll
openpyxl: 3.0.5
pandas: 1.1.3
pint: 0.11
xarray: 0.15.1
yaml: 5.3.1
iam_units: installed
jupyter: installed
matplotlib: 3.3.2
plotnine: 0.7.0
pyam: 0.7.0+4.gc1ed1f8
c1ed1f8 (HEAD -> master, origin/master, origin/HEAD) Add a tutorial how to read data from GAMS gdx to pyam (#424)
GAMS: 33.1.0
python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
This may be related to iiasa/ixmp#227, which was closed via iiasa/ixmp#298. See the latter for extensive discussion & records of various tests and diagnostics. See also iiasa/ixmp#215, still open re: performance tests.
Jupyter might be one culprit (the description says “Module_x is called from an ipynb”). To wit:
ixmp aggressively garbage-collects objects when they are deleted, i.e. no longer referenced by any other Python object/variable. This in turn allows the Java objects to be GC'd.
Jupyter/IPython retain references to output of previous commands/cells, e.g. _23 is a special variable referring to the output of a notebook cell numbered [23].
Even if a cell is re-run, producing output, these references are not automatically cleared.
These would prevent Python/ixmp from GC'ing objects.
As usual, in order to isolate the issue in message_ix or ixmp, diagnose it, and check that a fix works, it would be necessary to reproduce the issue. One way would be to construct a minimal test case with "Module_x" and the notebook in question. In the process, it should be checked if the issue only occurs under Jupyter (per the above).
The issue of running out of Java Heap Space, results when scenarios are loaded sequentially. A scenario is loaded, data is retrieved from the scenario (no checkout/commit is performed), data is then processed. (details below). A maximum of 12 scenarios can be processed, before memory issues are encountered. For each scenario which is loaded, more memory is required as opposed to memory being freed up when the routine for one scenario is completed, which would be available for the next scenario which is processed.
Code sample or context
I have made a small program (module_x) that uses some functions from the "old" reporting in message_data. Module_x process (python):
Module_x is called from an ipynb (>160 scenarios). The jupyter notebook loops over the scenario list and does nothing else except loop over scenario names and for each scenario
Problem description
Memory issues occur after approx. 10-12 scenarios.
Versions