jgehrcke / covid-19-germany-gae

COVID-19 statistics for Germany. For states and counties. With time series data. Daily updates. Official RKI numbers.
MIT License
144 stars 48 forks source link
7-day-incidence 7-tage-inzidenz case-count county covid-19 covid19 daily-updates deutschland fallzahlen historical-data history landkreise risklayer rki time-series timeline

COVID-19 case numbers for Germany 😷

Update April 2023: no more daily updates.

Some counties in Germany have stopped reporting data in January 2023, and it is probably fair to say that by now there is not so much demand anymore for a project like this.

I would like to say a huge Thank You for your tremendous interest, for contributing, for critical discussion, and for helping reveal the most brittle kind of edge cases. All that helped keep the data flowing while rarely compromising on quality.

In the future, we hopefully do not need an underground software engineering effort like this anymore, and state of Germany will be able to expose relevant data quickly via useful and robust interfaces.

For me personally, this was a rather significant engineering effort and I learned a whole lot. I feel both, pain and happiness when I go through the almost 300 patches that I had been working on since March 20, 2020.

Literature referencing this project

The following list is based on a non-exhaustive web search:

Other projects that are or were using this repository

🇩🇪 Übersicht

(see below for an English version)

🇺🇸 Overview

Contact, questions, contributions

You probably have a number of questions. Just as I had (and still have). Your feedback, your contributions, and your questions are highly appreciated! Please use the GitHub issue tracker (preferred) or contact me via mail. For updates, you can also follow me on Twitter: @gehrcke.

Plots

Note that these plots are updated multiple times per day. Feel free to hotlink them.

Note: there is a systematic difference between the RKI data-based death rate curve and the Risklayer-based death rate curve. Both curves are wrong, and yet both curves are legit. The incidents of death that we learn about today may have happened days or weeks in the past. Neither curve attempts to show the exact time of death (sadly! :-)) The RKI curve, in fact, is based on the point in time when each corresponding COVID-19 case that led to death was registered in the first place ("Meldedatum" of the corresponding case). The Risklayer data set to my knowledge pretends as if the incidents of death we learn about today happened yesterday. While this is not true, the resulting curve is a little more intuitive. Despite its limitations, the Risklayer data set is the best view on the "current" evolution of deaths that we have.

The individual data files

How is this data set different from others?

CSV file details

Focus: predictable/robust machine readability. Backwards-compatibility (columns get added; but have never been removed so far).

Note that the numbers for "today" as presented in media often actually refer to the last known state of data on the evening before. To address this ambiguity, the sample timestamps in the CSV files presented in this repository contain the time of the day (and not just the day). With that, consumers can have a vague impression about whether the sample represents the state in the morning or evening -- a common confusion / ambiguity with other data sets.

The recovered metric is not presented because it is rather blurry. Feel free to consume it from other sources!

Quality data sources published by Bundesländer

I tried to discover these step-by-step, they are possibly underrated (April 2020, minor updates towards the end of 2020):

Further resources

Changelog

This is a very high-level changelog. Technical details of reporting changed all the time, most details can be inferred from GitHub issues.

What you should know before reading these numbers

Please question the conclusiveness of these numbers. Some directions along which you may want to think:

If you keep these (and more) ambiguities and questions in mind then I think you are ready to look at these numbers and their time evolution :-) 😷.

Thoughts about reporting delays

In Germany, every step along the chain of reporting (Meldekette) introduces a noticeable delay. This is not necessary, but sadly the current state of affairs. The Robert Koch-Institut (RKI) seems to be working on a more modern reporting system that might mitigate some of these delays along the Meldekette in the future. Until then, it is fair to assume that case numbers published by RKI have 1-2 days delay over the case numbers published by Landkreise, which themselves have an unknown lag relative to the physical tests. In some cases, the Meldekette might even be entirely disrupted, as discussed in this SPIEGEL article (German). Also see this discussion.

Wishlist: every case should be tracked with its own time line, and transparently change state over time. The individual cases (and their time lines) should be aggregated on a country-wide level, anonymously, and get published in almost real time, through an official, structured data source, free to consume for everyone.

Attributions

Beginning of March 2020: shout-out to ZEIT ONLINE for continuously collecting and publishing the state-level data with little delay.

Edit March 21, 2020: Notably, by now the Berliner Morgenpost seems to do an equally well job of quickly aggregating the state-level data. We are using that in here, too. Thanks!

Edit March 26, 2020: Risklayer is coordinating a crowd-sourcing effort to process verified Landkreis data as quickly as possible. Tagesspiegel is verifying this effort and using it in their overview page. As far as I can tell this is so far the most transparent data flow, and also the fastest, getting us the freshest case count numbers. Great work!

Edit December 13, 2020: for the *-rl-crowdsource*.csv files proper legal attribution goes to

Risklayer GmbH (www.risklayer.com) and Center for Disaster Management and Risk Reduction Technology (CEDIM) at Karlsruhe Institute of Technology (KIT) and the Risklayer-CEDIM-Tagesspiegel SARS-CoV-2 Crowdsourcing Contributors