prio-data / prediction_competition_2023

Code for generating benchmark models and evaluation scripts for the 2023 VIEWS prediction competition
4 stars 5 forks source link

out of memory when using plotting.collect_plotting_data() at pgm level #30

Closed kvelleby closed 1 year ago

kvelleby commented 1 year ago

Describe the bug Using pd.concat() across prediction windows without aggregation or subsetting uses up my 64Gb RAM before crashing.

To Reproduce Steps to reproduce the behavior:

from plotting import collect_plotting_data
from pathlib import Path

base = Path("/path/to/submissions")
actuals = Path("/path/to/actuals")
pgm_submission = base / "submission_with_pgm_level_predictions"
collect_plotting_data(models = [pgm_submission], 
                      actual_folder = actuals, 
                      target = "pgm")

Expected behavior It should ideally collect the data I want to plot. While just collecting all data works at the country-level, it does not at pgm-level.

Desktop (please complete the following information):

kvelleby commented 1 year ago

So, this is my fault, but just tossing it out there. When writing functions to work with the pgm-level data, we will have to aggregate and subset before concatenating.

kvelleby commented 1 year ago

The plotting functionality is not yet in main branch, so I'll close this now. This needs to be addressed when merging in the plotting functionality, however.