Open mathiasflick opened 3 years ago
Thank you @mathiasflick for the report.
I had a quick look into logs and found
Traceback (most recent call last):
File "tools/build-rki-csvs.py", line 499, in <module>
main()
File "tools/build-rki-csvs.py", line 52, in main
df_by_lk, df_berlin_cases_sum, df_berlin_deaths_sum = fetch_and_clean_data()
File "tools/build-rki-csvs.py", line 176, in fetch_and_clean_data
assert lacking_wrt_ref == set([11000, 3152])
AssertionError
Looks like once again the set of amtliche gemeindeschlüssel changed in the RKI data set -- in the past that has always been a human error somewhere in the pipeline. The code might be overly strict. I might be able to precisely understand and fix this tomorrow. Hopefully.
Data for this Landkreis were missing, recently:
"16056": {
"name": "SK Eisenach",
"state": "Thüringen",
"lat": 50.9833,
"lon": 10.3167,
"population": 42250
},
I may want to remove the lacking_wrt_ref
check, update csv-epsilon-merge.py to allow for base set to contain more columns than extension set -- and then to forward-fill those columns.
On vacation. Didn't get to this yet. Sorry about that :/
I have addressed this in #1827.
I have looked at the data more closely to better understand what happened. The fact that 16056 disappeared from the RKI data set made me 'hope' that reporting for this Landkreis was merged with another Landkreis.
Indeed, there is a pretty suspicious case numer jump for Landkreis 16063 at the time when the case count for Landkreis 16056 did not change anymore:
That jump is specifically from 8579 to 10572:
>>> 10572 - 8579
1993
The last reported case count value for Landkreis 16056 was 1975.
I think we can safely conclude that on September 12, reporting for Landkreise 16056 and 16063 was merged, and reported together under AGS 16063.
With the solution from #1827 I have now retained Landkreis 16056 in the CSV files, simply forwarding the last known value (1975). That's incorrect, the value should drop to 0 so that the sum over the Landkreise evolves more correctly. Given the relatively small number though I think I will just leave this as-is. Feedback appreciated.
I have just looked at the columns 16056 and 16063 the RL data set. They have seemingly be synced a while ago: they contain the same values, for the entire time range of interest. (that is, the sum is also wrong)
The two landkreise in question:
"16056": {
"name": "SK Eisenach",
"state": "Thüringen",
"16063": {
"name": "LK Wartburgkreis",
"state": "Thüringen",
on a map:
(from https://www.bik-gmbh.de/download/Gebietsreform_Thueringen_zum_GS1906.pdf)
So, I think it's fair to say that Eisenach, kreisfreie Stadt
case numbers are reported as part of Wartburgkreis
, which geographically and organizationally might make sense.
Some research regarding local reporting of corona-related indicators (e.g. for Eisenach and Wartburgkreis) clearly support your assumption - although I was not able to find any kind of official confirmation. Probably it is a politically motivated move in order to get "better" (i.e. lower) numbers by averaging the high one out ... But that is just my personal opinion! Anyway - this kind of "summarization" does create problems with the processing of data in dependent systems - leaving zero values and/or grey areas like e.g in the RKI dashboard:
By the way, the zero for Luckenwalde/Parchim is caused by a hacking incident - they are not able to deliver ... Source: https://www.kreis-lup.de/corona/
Greetings from Cologne Mathias
Thank you Mathias for the additional insight! Huh. :)
RL did drop the data colums for landkreis 16056 and that required further patches -- done in https://github.com/jgehrcke/covid-19-germany-gae/pull/1842.
Both the RL and RKI heatmaps now show 16056+16063 both using the data from 16063.
Perfect! Thank you so much for your work! Now I need to start my own upstream patching ... Greetings from Cologne Mathias
After a little bit of research I probably found the reason for the unexpected change:
According to information provided by the state of Thüringen, Eisenach was officially made part of the Wartburgkreis (effective as of 2021-07-01).
Source: https://statistik.thueringen.de/datenbank/gemauswahl.asp
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there (important for 7di computation) and when officially updated maps (shapefiles) will be available.
Thank you again and greetings from Cologne
Mathias
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there
Hey Mathias. Ouch. Thank you for that reminder. I will have to double-check, but it's likely that 7di number have been a little off for 16063 because I didn't think this through before. Thank you!
Keeping track of this topic here: https://github.com/opstrace/opstrace/issues/1472
There are no updates to the rki files since four days now (as of 2021-10-04, 20:45 local time). Is there a problem with changes to the input data provided by RKI? If yes, how can I help?
Greetings from Cologne Mathias