Discrepancy between incidence and cumulative death forecasts for RobertWalraven - Githubissues

reichlab / covid19-forecast-hub

Projections of COVID-19, in standardized format

https://covid19forecasthub.org

Other

444 stars 326 forks source link

Discrepancy between incidence and cumulative death forecasts for RobertWalraven #1924

Closed eycramer closed 3 years ago

eycramer commented 3 years ago

Hi @RobertWalraven,

We are filing this issue because we noticed that your one-week ahead forecasts of incident and cumulative deaths didn't agree with each other this week. We would expect the one-week ahead predictive median for cumulative deaths to be approximately equal to the last reported cumulative deaths plus the one-week ahead predictive median for incident deaths, but this was not the case for your forecast submitted this week. The reported cumulative deaths for the US as of Saturday October 31st was 230,548 according to the data reported in the JHU CSSE repository at the end of day on Sunday November 1st. The median of your submitted one week ahead forecast for cumulative deaths was 233,126, which suggests a forecast of 2,578 incident deaths. However, this is not in agreement with the median of the submitted one week ahead forecast for incident deaths, which was 4,994.

We can imagine that this might happen for any of several reasons: for example, you may be using two different models to forecast incident and cumulative deaths, or you may be calculating cumulative deaths using a different reference data set or a different procedure for aggregating to the national level than we are (for reference, we describe our approach to calculating aggregated cumulative deaths based on the JHU CSSE data here. Regardless, we wanted to bring this discrepancy to your attention in case you were not aware of it.

Our goal is to make the ensemble forecasts of incident and cumulative deaths consistent with each other. This is only possible if the submitted forecasts of incident and cumulative deaths from each model are consistent with each other. Can you please respond to this issue with a description of why there is this discrepancy and also whether you have a plan to fix it? Thanks!

RobertWalraven commented 3 years ago

I am not sure where you got the numbers above. I looked at my submission for 202-11-02 in the Data Processed folder and it shows that the 1-week ahead forecast for cumulative deaths was 237044.73 and the 1-week ahead forecast for incremental deaths was 7140.99. 230548 + 7141 = 237689, so the discrepancy is 237689 - 237045 = 644. I don't know if that is still considered to be too big to be "approximately equal", but that is considerably smaller than the 4994-2578 = 2416 discrepency suggested above.

RobertWalraven commented 3 years ago

I believe the actual discrepancy of 644 was the result of my modeling the US data independently from the state data rather than rolling up the state data into the US data. I'll check that possibility out carefully and if it is true, I'll return to rolling up the US data from now on.

eycramer commented 3 years ago

Ah sorry for my mistake. Yes, the discrepancy is 644.

And thank you for looking into this!

RobertWalraven commented 3 years ago

Is 644 a reasonable discrepancy? What kinds of discrepencies are other teams getting?

eycramer commented 3 years ago

Theoretically, we expect that the forecasted incident deaths and the estimated incident deaths (based on cumulative predictions) should be the exact same. Forecasted cumulative deaths for 1 week ahead should = incident deaths for 1 week ahead + observed cumulative deaths.

However, we understand that this is not always the case due to different data sources, etc. I tagged you in this issue to make you aware of it.

There are a number of other issues (both open and closed) that I have filed in prior weeks for teams with this same issue. There is a range in the size of the discrepancies across teams and weeks.

RobertWalraven commented 3 years ago

I don't use the raw truth data in my model because it is noisy and can have spikes, both positive and negative, due to one day corrections that can pull the 7-day inc change artificially high or low. In other words, I fit where the truth data should be, not where a simple sum of 7-day incs say it should be. This is why my model does not quite agree with JHU raw data cum deaths + 7-day sum of forecast inc deaths = 7-day cum deaths forecast Normally the formula would be pretty close, but this week the data for a lot of states is going crazy, so the descrepency for my model will be a little larger than last time.

RobertWalraven commented 3 years ago

You can close this issue now.