reichlab / forecast-repository

Codebase for Zoltar forecast repository
https://zoltardata.com/
GNU General Public License v3.0
6 stars 3 forks source link

depreciate `point` prediction type #377

Closed matthewcornell closed 5 months ago

matthewcornell commented 5 months ago

As phase 2/2 of #367 , this issue is to depreciate the point prediction type in favor of thew new mean, median, and mode ones, leaving point prediction type as legit for legacy projects. Note that this issue does not include retiring the point prediction type, i.e., completely removing it from the system, and (presumably) "relabeling" prior data. Here's @nickreich 's justification:

I am not yet convinced that we want to do phase three, as it would require "relabeling" prior data in places where we cannot be certain what the original intent or data type should be.

Question: What does depreciation mean in this case? Do we reject all new point forecasts (but keep existing ones) for all projects, or maybe just for "new" projects? Or do we allow all point forecasts but simply issue a warning?

matthewcornell commented 5 months ago

@nickreich : What's your thinking about the above?

nickreich commented 5 months ago

@matthewcornell could you compile a list that indicates which of the current Zoltar projects have "point" predictions currently? That would help me answer the above questions. My only clear response to the above is that we do NOT want to do this:

Or do we allow all point forecasts but simply issue a warning?

I could see allowing existing projects to keep submitting point forecasts. But maybe would depend on which ones currently do have them...

matthewcornell commented 5 months ago

@nickreich I wrote an sql query [1] to do this. Results (see table): every project has point data. point_count is how many forecasts have point data.

| name                                    | project_id | point_count  |
|-----------------------------------------|------------|--------------|
| Impetus Province Forecasts              | 4          | 37           |
| CDC Retrospective Forecasts             | 6          | 5647         |
| CDC Real-time Forecasts                 | 9          | 2167         |
| Docs Example Project                    | 41         | 1            |
| COVID-19 Forecasts                      | 44         | 8695         |
| Election Forecasts                      | 218        | 697          |
| ECDC European COVID-19 Forecast Hub     | 238        | 2514         |
| CDC Influenza Hospitalization Forecasts | 299        | 897          |
| COVID-19 Forecasts Viz Test             | 316        | 10           |
| NBA predictions                         | 328        | 1153         |
|-----------------------------------------|------------|--------------|

[1]

SELECT p.name, fm.project_id, count(*)
FROM forecast_app_forecastmetaprediction AS fmp
     JOIN public.forecast_app_forecast f on f.id = fmp.forecast_id
     JOIN public.forecast_app_forecastmodel fm on f.forecast_model_id = fm.id
     JOIN public.forecast_app_project p on fm.project_id = p.id
WHERE fmp.point_count <> 0
GROUP BY p.name, fm.project_id
ORDER BY fm.project_id;
matthewcornell commented 5 months ago

Per meeting w/ @nickreich we decided to keep the point type b/c it could be useful in situations where the more specific type (e.g., mean, median, or mode) is unknown. ( @nickreich please check me on that rationale.) At the same time, we decided to document that the point type should be avoided when possible in favor of one of the more specific types. @matthewcornell will write this up in a suitable place in https://docs.zoltardata.com/ .