replicahq / doppelganger

A Python package of tools to support population synthesizers
Apache License 2.0
165 stars 32 forks source link

Memory pressure in generation #10

Closed katbusch closed 7 years ago

katbusch commented 7 years ago

Generation can fail because of memory pressure, eg on PUMA 05302

  File "scripts/generate_all_pumas.py", line 158, in generate_population
    population = Population.generate(allocator, person_model, household_model)
  File "/home/kat/.local/lib/python2.7/site-packages/doppelganger/populationgen.py", line 93, in generate
    person_model, [inputs.AGE.name, inputs.SEX.name], Population._extract_person_evidence
  File "/home/kat/.local/lib/python2.7/site-packages/doppelganger/populationgen.py", line 76, in _generate_from_model
    results_dataframe = pandas.DataFrame(results, columns=column_names)
  File "/home/kat/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 314, in __init__
    arrays, columns = _to_arrays(data, columns, dtype=dtype)
  File "/home/kat/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 5715, in _to_arrays
    dtype=dtype)
  File "/home/kat/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 5789, in _list_to_arrays
    content = list(lib.to_object_array_tuples(data).T)
  File "pandas/_libs/src/inference.pyx", line 1660, in pandas._libs.lib.to_object_array_tuples (pandas/_libs/lib.c:67515)
MemoryError
katbusch commented 7 years ago

Seems like this this issue arrises during the generation stage when populations are large. I don't think there's a leak but we can probably be smarter about what we keep in memory during generation.