reichlab / 2017-2018-cdc-flu-contest

Code and submissions for 2017-2018 CDC flu prediction contest
3 stars 2 forks source link

pull "Real-time" data from DELPHI API #10

Open elray1 opened 7 years ago

elray1 commented 7 years ago

at least for prospective-prediction years, predictions only use data actually observed by prediction date

nickreich commented 7 years ago

Proposal: we continue to ignore backfill for ensemble comparison (flusight-test) but for all CDC-related projects we feed unrevised data to create predictions but revised data to fit the models.

nickreich commented 7 years ago

Some possibly useful code: https://github.com/reichlab/flu-eda/blob/master/backfill/flu-backfill-exploration.Rnw https://github.com/reichlab/flu-eda/blob/master/backfill/download-flu-data-with-backfill.R

nickreich commented 7 years ago

My sense is that the key place to substitute the unrevised data is when we first load the data for the get_log_scores_via_trajectory_simulation function.

nickreich commented 7 years ago

Although it is only loaded so early so that we "know what the dimensions of the results data frame should be." (according to code comments) So might make sense to leave that intact and then read data in within each iteration of the loop over analysis_time_season_week

elray1 commented 7 years ago

I think your second suggestion, within each iteration of the loop over analysis_time_season_week, makes sense. That's because when we do prediction at each analysis_time_season_week, we need to use a data set that's "everything that was observed up to analysis_time_season_week". Ideally, I think it would be good to have functionality going forward that can handle either prediction using final observed data (as we have done in the past) or just the data that were available by the analysis_time_season_week (as we're doing for this project). I guess that could be handled by adding an argument to the get_log_scores_via_trajectory_simulation function specifying the data set (rather than hard coding in a path), and another argument specifying what type of data set we're giving it?