jgehrcke / covid-19-germany-gae

COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.
MIT License
145 stars 48 forks source link

some reason why RKI data is not updated since at 2020-04-08? #93

Closed avila closed 4 years ago

avila commented 4 years ago

it seems that RKI data is not updated since 2020-04-08T17:00:00+0000. Any reason for that? I havent found an issue on the topic, sorry if I missed it. And thanks for the effort, by the way!

jgehrcke commented 4 years ago

Hey @avila thanks for asking.

First of all, the data are still being updated regularly :-).

But I think there is a deeper aspect to your question which certainly deserves attention.

You asked on April 11 and you saw the last data point from April 8.

I updated RKI data today (April 14) and yet the last data point is from April 12.

This is intended and good!

Sometimes the RKI's ArcGIS system doesn't even yield data for the last 1-2 days. When it does then these data points are known to significantly underestimate the actual count ("actual count" being the official RKI count when queried a couple of days later).

When one looks at the RKI time series data in their ArcGIS system today then it is reasonable to assume that only the data points up until today - 2 days reflect the actual count reasonably well.

When you want to see case count data then you should categorize your motivation into either i) I want to have a bit of a sensationalistic impression for what the current state is or ii) I want to understand as good as possible what the historical evolution of case count numbers was up until 1-2 days ago.

A bit of a guideline:

jgehrcke commented 4 years ago

A plot I created on March 31 to visualize this effect:

data-sources-comparison-2020-03-31

See how the red line (RKI data) is "above" the other lines (other data sources) for most of the past, but not for the most recent days. The decrease of the slope of the red line towards April 1 in that plot is as of processing delays and is corrected for in the future, as you can see in the same plot for today: data-sources-comparison-2020-04-14

The low slope of the red curve in the first plot around March 31 is not visible anymore in the second plot. Notably, the red line stays "above" the other lines.

In the second plot, towards the right end of the plot, said effect is still there, though a little harder to see as of the scaling properties of the plot.

jgehrcke commented 4 years ago

Some quotes from RKI:

Für die Darstellung der neuübermittelten Fälle pro Tag wird das Meldedatum verwendet – das Datum, an dem das lokale Gesundheitsamt Kenntnis über den Fall erlangt und ihn elektronisch erfasst hat.

Zwischen der Meldung durch die Ärzte und Labore an das Gesundheitsamt und der Übermittlung der Fälle an die zuständigen Landesbehörden und das RKI können einige Tage vergehen (Melde- und Übermittlungsverzug). Jeden Tag werden dem RKI neue Fälle übermittelt, die am gleichen Tag oder bereits an früheren Tagen an das Gesundheitsamt gemeldet worden sind. Diese Fälle werden in der Grafik Neue COVID-19-Fälle/Tag dann bei dem jeweiligen Datum ergänzt.

Durch den Meldeverzug sind die Daten die letzten Tage in der Grafik noch unvollständig und füllen sich mit den in den kommenden Tagen nachfolgend übermittelten Daten auf. Aus dem Verlauf der übermittelten Daten allein lässt sich daher kein Trend zu den aktuell erfolgten Neuinfektionen ablesen.

avila commented 4 years ago

wow! thanks for the great response (ant the work!). I believe I can close this issue very well closed! :)