neherlab / covid19_scenarios

Models of COVID-19 outbreak trajectories and hospital demand
https://covid19-scenarios.org
MIT License
1.36k stars 352 forks source link

Please add model cumulative cases #517

Closed coderb closed 4 years ago

coderb commented 4 years ago

Please add data for the model cumulative cases.

This would be useful when trying to manually fit parameters to observed data especially around interventions.

/edit by noleti: Rephrasing as "Please add a line to the results plot which represents the cumulative number of cases in the simulation (i.e. cumulative cases (simulation)). The number would essentially be total population-current susceptible".

noleti commented 4 years ago

Is this the same as https://github.com/neherlab/covid19_scenarios/issues/504 ?

coderb commented 4 years ago

no, it's a request for a new data series which is not currently displayed

noleti commented 4 years ago

Can you be more specific? You meant the sum of people who were infected, in hospital, in ICU, or dead? I.e. total population-susceptible?

coderb commented 4 years ago

the cumulative number of people that have been infected over time given by the model, ie the model version of the data series "cumulative cases (data)".

ccpf commented 4 years ago

Isn't that: cumulative deaths + icu + severely ill + infectious + recovered? At least this is how I have been interpreting the model output.

noleti commented 4 years ago

AFAIK "Cumulative cases (data)" is directly taken from our data/case-counts/*/*.tsv files (which are, in turn, taken from third party APIs). So you could directly take the values from our repo. Does that help?

@ccpf: For the simulation data, you need to be careful not to double-count people. The same person can count towards 'Cumulative hospitalized' and 'Cumulative recovered', for example

ccpf commented 4 years ago

@noleti: true! Thanks for pointing that out. So then coderb is right in that we cannot know the (modelled) cumulative cases? Regarding your answer to him/her, I think the question was about the modelled cumulative cases and not the actual data. Cheers.

coderb commented 4 years ago

@noleti i think you misunderstand what i am asking for. simply to plot (in the graph section) what the model gives for the cumulative number of cases. i believe the formula would simply be N - S where N is the total population and S is the number of susceptible (assuming there is no change in total population over time).

noleti commented 4 years ago

Your N-S is the number I suggested in https://github.com/neherlab/covid19_scenarios/issues/517#issuecomment-612770785. I'm not sure how this will help with manual fitting, as usually there is a big problem with undercounting of infected (but not severe) cases. Fitting is most reliably done based on deaths.

In any case, I think it is now clear what you are asking for, I will update your initial question to describe it more clearly from my perspective.

coderb commented 4 years ago

thank you

nnoll commented 4 years ago

Hey @coderb. My worry about including this is as @noleti states, it's probably not wise to fit our cumulative infectious categories directly to the cumulative case counts as there's an unknown underreporting fraction for each scenario. Furthermore, I worry about our UX here: our results plot is already quite dense with information that I fear adding another plot will compound our problem there.

One possibility is we could break up the plot into "current" and "cumulative" reports. This would allow us to plot the data and maybe is better from an interpretability standpoint. @rneher and @ivan-aksamentov thoughts?

coderb commented 4 years ago

@nnoll regarding the information overload, perhaps just defaulting the data series to unselected would work.

i understand you should not directly fit the cumulative cases, however, i'm interested in the overall shape. additionally, that unknown underreporting fraction seems to be a pretty important factor in the model's prediction of the future fatality curve.

i've been working on improving the age specific parameter matrix to make more sense for my particular location which i think i have done, however without the model cum cases it's hard to get a good sense for how the other parameters affect the model. thanks.

coderb commented 4 years ago

@nnoll the actual number of deaths is not as reliable as you would think: New York to start reporting 'probable' coronavirus deaths to CDC

ccpf commented 4 years ago

@coderb yes, and in Italy also according to this large scale analysis of "excess deaths" in over 1000 communities: http://www.deplazio.net/images/stories/SISMG/SISMG_COVID19.pdf here is some summary/analysis in English: https://towardsdatascience.com/covid-19-excess-mortality-figures-in-italy-d9640f411691 suggesting that official deaths should be multiplied by a factor of about 2.5

shijurodhaz commented 4 years ago
Screen Shot 2020-04-21 at 4 11 00 AM

I submitted a pull request for these changes. I basically did population - current susceptible as mentioned in the issue title. Let me know if anything needs to be modified. Thanks!

amindelpazir commented 4 years ago

695

ivan-aksamentov commented 4 years ago

Wontfix, see #620