neherlab / covid19_scenarios

Models of COVID-19 outbreak trajectories and hospital demand
https://covid19-scenarios.org
MIT License
1.36k stars 354 forks source link

Spanish numbers on hospitalization and ICU are wrong #595

Closed ccpf closed 4 years ago

ccpf commented 4 years ago

πŸ› Bug Report

Please note that this is not a bug in your software but a bug in the data from Spain you use and display alongside the model results. The numbers for hospitalizations and ICU admissions are wrong as only some regions (at the moment only Madrid) report thenumber of people that are CURRENTLY in hospital/ICU, while all other regions report the cumulative numbers. I checked the source you give (https://raw.githubusercontent.com/datadista/datasets/master/COVID%2019/nacional_covid19.csv) and those numbers are taken from the ministry reports but there they happily added the CURRENT and CUMULATIVE numbers to arrive at whatever total. Only about a week ago did it occur to them that this is maybe not the best approach and since then they no longer provide any totals which is more reasonable.

How to reproduce

Check the PDF ministry reports at: https://www.mscbs.gob.es/en/profesionales/saludPublica/ccayes/alertasActual/nCov-China/documentos/Actualizacion_79_COVID-19.pdf (latest) and look at the YEN symbol footnote to Table 2.

😯 Current Behavior

Considering that your model requires CURRENT data but the points you plot are mostly CUMULATIVE, your hospital/ICU capacities and overflow effect on mortality will all be off for Spain (at least if we try to fit the model to the data).

πŸ€” Expected Behavior

Considering that it is impossible to recreate a time series of CURRENT hospitalizations/ICUs from the published ministry data, it may be better to not show that data at all for Spain.

πŸ’ Possible Solution

What you need is a consistent data set of CURRENT data but I am not sure where you could get that from since almost all regions only report cumulative numbers. BTW, if you find the current data somewhere I would be happy to curate that for Spain.

πŸ”¦ Context

Anyone trying to fit the model to either the ICU or hospitalization data will fail.

πŸ’» Code Sample

🌍 Your Environment

not relevant

Related

-

noleti commented 4 years ago

Thanks for highlighting this. As you noted, we require cumulative data in our .tsvs (for cases, recovered, and deaths), and current values for hospitalized and ICU. As we already have a spain parser for all the regions, we should be able to compute the totals ourselves from the regions, correct? But it would be hacky, and require data on when the reporting switched.

I did some quick analysis, and it seems that reporting was switched from current to cumulative for hospitalizations and ICU for ESP-Castilla-La Mancha on 2020-04-12, for ESP-Castilla y LeΓ³n on 2020-04-07 for hospitalized and on 2020-04-17 for ICU, for ESP-C. Valenciana on 2020-04-09, ESP-Galicia seems to use current for ICU but cumulative for hospitalization, ESP-Madrid seems to use current for hosp and ICU throughout. Can you confirm this?

ccpf commented 4 years ago

Yes those switchover dates and regions seem correct. And yes you would be able to compute the "totals" for deaths, cases and cured but how would you get the "current" for hospitalized and icu from the mostly cumulative data? You could do some analysis on the Madrid data and see how the current hosp/icu data evolved in relation to fatalities for instance and then extrapolate that onto all other religions based on their fatalities but this would really be quite hacky, esp. since Madrid was by far the worst hit region so would have had some serious overflow issues in ICU. Not sure it can be done ...

noleti commented 4 years ago

Yes, it is likely too hacky to implement. I was thinking about computing daily deltas from the cumulative sums, and then having sliding windows of length defined by our ICU and hospital stay parameters. But that would turn "real" data into estimations.

We might need to disable hospitals and ICU for Spain as country instead. The fitting is done based on deaths (so should not be influenced by these errors), but I agree it is confusing and wrong to present these numbers are real values.

ccpf commented 4 years ago

Yep I agree. I have been fitting the model to deaths also, although I was using deaths + 50% to account for the dark figure estimated from excess deaths. See here for instance: https://www.euromomo.eu/

noleti commented 4 years ago

i realized that individual spain parsers might need to be updated to reflect these issues as well, or just have hospital and ICU removed if no appropriate current data is available. For most recent data, that would mean that only ESP-Galicia has current for ICU (no current for hospitalization), ESP-Madrid has current for hosp and ICU throughout. All other spain parsers should have their hospital and ICU data removed. We might need to check for similar issues for other parsers, as the requirements on current data for hosp and ICU were not very clear at the beginning when many parsers were written (and the difference is not easy to spot as long as case numbers quickly rise).

nnoll commented 4 years ago

I think we should stop reporting hospitalizations in the interim while we sort out what to do here.

ccpf commented 4 years ago

Great, thanks. BTW, and this is kind of a separate issue, but you also sometimes have the number of "current" ICUs exceed the your model's ICU bed capacity. E.g., in the ESP-Madrid scenario where current ICUs reached values of around 1500 at some point, your ICU bed capacity is 634. I guess these are the official numbers from a couple years back and the discrepancy is due to hospitals everywhere having increased their ICU capacity short term. So you might want to implement something like an automatic adjustment to ensure that your model's ICU beds can never be below the officially reported number of current ICU cases as having an ICU bed number in the model that is too low may affect your mortality since you penalize the overflow with a higher mortality though your "Severity of overflow parameter" which is set to "2" be default. You can of course compensate for it by reducing the severity to 1 but I thought I'd mention it.

noleti commented 4 years ago

That's a good point. The current ICU beds for ESP-Madrid are only estimated, and unfortunately not based on real data (https://github.com/neherlab/covid19_scenarios/blob/master/data/populationData.tsv#L117). Taking a max from reported ICU numbers could be done I guess. Can you file a separate issue on this? Changes would likely need to be made in data/scripts/getPopulationData.py, where most of the data sources are pulled together and processed.

ccpf commented 4 years ago

OK, submitted as issue #621