prio-data / prediction_competition_2023

Code for generating benchmark models and evaluation scripts for the 2023 VIEWS prediction competition
4 stars 5 forks source link

Refactor eval #38

Closed kvelleby closed 1 year ago

kvelleby commented 1 year ago

A larger refactor of the whole project, including how we store features data, actuals (test) data, and submissions with predictions and evaluation data. The overarching idea is to use a pyarrow.dataset.Dataset in the Apache Hive flavor. Particularly, all data is structured with a root_folder/{target}/{Apache Hive} where target can be "cm" or "pgm". The Apache Hive uses folder name conventions like "variable_name=variable_value", and we can target the root/{target} folder to read the whole dataset (i.e., without targetting any .parquet-file directly). The benefit of this is ready particularly for the pgm-level, as it becomes easy to read only subsets of the data to avoid running out of memory. Aggregation of data also becomes easier to handle. The refactor includes also a utility-module that include helper functions (e.g., getting calendar years and months from month_id).

kvelleby commented 1 year ago

Example code

Read data (e.g., predictions or actuals, or any data structured as folder/{target}/{Apache Hive}:

from pathlib import Path
from utilities import views_month_id_to_year, views_month_id_to_month, views_month_id_to_date, get_target_data, list_submissions

submissions = Path("path/to/submissions/folder")

# Read predictions
df = get_target_data(list_submissions(submissions)[0], "cm")
df = df.reset_index()

# Get date info from month_id
df["year"] = views_month_id_to_year(df["month_id"])
df["month"] = views_month_id_to_month(df["month_id"])
df["date"] = views_month_id_to_date(df["month_id"])
df

Evaluate data based on CRPS, IGN, MIS

actuals = "path/to/actuals/folder"
bins = [0, 0.5, 2.5, 5.5, 10.5, 25.5, 50.5, 100.5, 250.5, 500.5, 1000.5]

# Get paths to submissions in a folder
submission_paths = list_submissions(submissions)

# Evaluate a single submission
evaluate_submission(submission_paths[0], actuals, targets = "cm", windows = ["Y2018", "Y2019", "Y2020", "Y2021"], expected = 1000, bins = bins)

# Evaluate a folder with many submissions
evaluate_all_submissions(submission_paths, actuals, targets = "cm", windows = ["Y2018", "Y2019", "Y2020", "Y2021"], expected = 1000, bins = bins)

Table evaluations

tables = "path/to/where/to/save/tables"

# Collect summary data for all submissions and aggregate across dimensions
evaluation_table(submissions, target = "cm", groupby = ["window"], across_submissions=False)

# Unit id, month_id, month, year, and window are all allowed. Windows will be pivoted into wide (might want to change this behavior)
evaluation_table(submissions, target = "cm", groupby = ["country_id"], across_submissions=False)

# It is also possible to aggregate across submissions
evaluation_table(submissions, target = "cm", groupby = ["month", "window"], across_submissions=True)

# You can write tables to LaTeX, HTML, and Excel format.
evaluation_table(submissions, target = "cm", groupby = ["month", "window"], across_submissions=True, save_to=tables)
kvelleby commented 1 year ago

There is also a command-line interface for both evalute_submissions.py (calling evaluate_all_submissions) and collect_performance.py (which calls evaluation_table).

kvelleby commented 1 year ago

You might see that a lot of asserts have been moved from the code. I want that to be in the test_compliance.py instead. test_compliance.py has not yet been updated to comply with the refactorization. When it has, we could optionally run that function in evaluate_all_submissions() before evaluation.