prio-data / prediction_competition_2023

Code for generating benchmark models and evaluation scripts for the 2023 VIEWS prediction competition
4 stars 5 forks source link

What to do about samples with negative values? #18

Open kvelleby opened 1 year ago

kvelleby commented 1 year ago

The competition is about predicting counts of fatalities. Counts are positive integers.

Yet, many prediction models yield continuous floats, possibly also negative numbers.

None of the metrics intrinsically fail when getting negative numbers, but the current binning-scheme of the Ignorance Score does not expect negative numbers, yielding an error.

Alternatives:

Happy to get input here. I think the most important is to settle on one approach and be clear to everyone. Personally, I am sceptical to us doing upsampling with anything else than scipy.signal.resample. Finding a suitable positive-only distribution, I think, is the responsibility of the contestants. If they are giving us negative values (or less than 1000 values), we have to use our pre-described and simple approach to adjust. I am also leaning towards allowing negative predictions and adding a bin for negative numbers in the ignorance score (but still truncating at 0 when we have to do upsampling due to getting less than 1000 samples).