Open psteinb opened 4 years ago
People pointed me to this variant of the same dataset https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data this contains updated numbers by age group (which is nice). But I am unclear how to aggregate the number of cases today?
IdBundesland | Bundesland | Landkreis | Altersgruppe | Geschlecht | AnzahlFall | AnzahlTodesfall | ObjectId | Meldedatum | IdLandkreis | Datenstand |
---|---|---|---|---|---|---|---|---|---|---|
14 | Sachsen | SK Dresden | A00-A04 | M | 1 | False | 182,113 | 2020-03-13 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A00-A04 | M | 2 | False | 182,114 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A05-A14 | M | 1 | False | 182,115 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 1 | False | 182,116 | 2020-03-09 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 1 | False | 182,117 | 2020-03-10 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 1 | False | 182,118 | 2020-03-12 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 1 | False | 182,119 | 2020-03-14 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 2 | False | 182,120 | 2020-03-16 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 3 | False | 182,121 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 3 | False | 182,122 | 2020-03-18 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 3 | False | 182,123 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 4 | False | 182,124 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | M | 1 | False | 182,125 | 2020-03-21 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 1 | False | 182,126 | 2020-03-11 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 1 | False | 182,127 | 2020-03-12 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 1 | False | 182,128 | 2020-03-16 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 2 | False | 182,129 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 5 | False | 182,130 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 2 | False | 182,131 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A15-A34 | W | 3 | False | 182,132 | 2020-03-21 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 1 | False | 182,133 | 2020-03-11 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 3 | False | 182,134 | 2020-03-12 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 4 | False | 182,135 | 2020-03-13 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 1 | False | 182,136 | 2020-03-14 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 1 | False | 182,137 | 2020-03-16 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 2 | False | 182,138 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 1 | False | 182,139 | 2020-03-18 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 9 | False | 182,140 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 5 | False | 182,141 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | M | 5 | False | 182,142 | 2020-03-21 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 1 | False | 182,143 | 2020-03-13 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 4 | False | 182,144 | 2020-03-14 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 1 | False | 182,145 | 2020-03-16 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 2 | False | 182,146 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 1 | False | 182,147 | 2020-03-18 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 3 | False | 182,148 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 1 | False | 182,149 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A35-A59 | W | 5 | False | 182,150 | 2020-03-21 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | M | 1 | False | 182,151 | 2020-03-07 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | M | 1 | False | 182,152 | 2020-03-13 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | M | 1 | False | 182,153 | 2020-03-18 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | M | 2 | False | 182,154 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | W | 1 | False | 182,155 | 2020-03-07 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A60-A79 | W | 1 | False | 182,156 | 2020-03-21 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A80+ | M | 1 | False | 182,157 | 2020-03-17 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A80+ | W | 1 | False | 182,158 | 2020-03-19 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
14 | Sachsen | SK Dresden | A80+ | W | 1 | False | 182,159 | 2020-03-20 00:00:00+00:00 | 14,612 | 23.03.2020 00:00 |
At least this one yields more recent numbers with today's date. Ideas?
how to filter this data https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data
There are some projects started in the wirvsvirus hackathon dealing with data. Thanks for the hint @frzb!
The ones I checked use the RKI data you mention. For example this provides Python and R code (github) or that one in R (github).
However the data and APIs are weird. Here you can configure a custom query URL, which is nice, but the returned data doesn't contain all federal states (e.g. not Sachsen, but Hamburg etc). If instead we use this and either download a csv or json file, the data contains all states. See update in this comment below.
The second R project from above looks decent, so you can check out how they do it. I don't speak R, but in Python I'd do something like this to aggregate the data:
>>> import pandas as pd
>>> df=pd.read_csv('https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv')
IdBundesland Bundesland Landkreis Altersgruppe Geschlecht AnzahlFall AnzahlTodesfall ObjectId Meldedatum IdLandkreis Datenstand NeuerFall NeuerTodesfall
0 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 207841 2020-03-14T00:00:00.000Z 1001 24.03.2020 00:00 0 0
1 1 Schleswig-Holstein SK Flensburg A15-A34 M 2 0 207842 2020-03-19T00:00:00.000Z 1001 24.03.2020 00:00 0 0
2 1 Schleswig-Holstein SK Flensburg A15-A34 M 1 0 207843 2020-03-21T00:00:00.000Z 1001 24.03.2020 00:00 0 0
3 1 Schleswig-Holstein SK Flensburg A15-A34 W 1 0 207844 2020-03-14T00:00:00.000Z 1001 24.03.2020 00:00 0 0
4 1 Schleswig-Holstein SK Flensburg A15-A34 W 2 0 207845 2020-03-18T00:00:00.000Z 1001 24.03.2020 00:00 0 0
>>> pd.to_datetime(df.Datenstand).unique()
array(['2020-03-24T00:00:00.000000000'], dtype='datetime64[ns]')
>>> df[df.Landkreis.str.contains('Dresden')].head()
IdBundesland Bundesland Landkreis Altersgruppe Geschlecht AnzahlFall AnzahlTodesfall ObjectId Meldedatum IdLandkreis Datenstand NeuerFall NeuerTodesfall
7041 14 Sachsen SK Dresden A00-A04 M 1 0 219882 2020-03-13T00:00:00.000Z 14612 24.03.2020 00:00 0 0
7042 14 Sachsen SK Dresden A00-A04 M 2 0 219883 2020-03-19T00:00:00.000Z 14612 24.03.2020 00:00 0 0
7043 14 Sachsen SK Dresden A05-A14 M 1 0 219884 2020-03-17T00:00:00.000Z 14612 24.03.2020 00:00 0 0
7044 14 Sachsen SK Dresden A15-A34 M 1 0 219885 2020-03-09T00:00:00.000Z 14612 24.03.2020 00:00 0 0
7045 14 Sachsen SK Dresden A15-A34 M 1 0 219886 2020-03-10T00:00:00.000Z 14612 24.03.2020 00:00 0 0
>>> cols=['Meldedatum', 'AnzahlFall']
>>> df=df[df.Landkreis.str.contains('Dresden')][cols]
>>> df.Meldedatum = pd.to_datetime(df.Meldedatum)
>>> df = df.sort_values('Meldedatum')
>>> df.head()
Meldedatum AnzahlFall
7083 2020-03-07 00:00:00+00:00 1
7079 2020-03-07 00:00:00+00:00 1
7044 2020-03-09 00:00:00+00:00 1
7045 2020-03-10 00:00:00+00:00 1
7061 2020-03-11 00:00:00+00:00 1
>>> ts=df.groupby('Meldedatum')['AnzahlFall'].sum()
>>> ts
Meldedatum
2020-03-07 00:00:00+00:00 2
2020-03-09 00:00:00+00:00 1
2020-03-10 00:00:00+00:00 1
2020-03-11 00:00:00+00:00 2
2020-03-12 00:00:00+00:00 5
2020-03-13 00:00:00+00:00 7
2020-03-14 00:00:00+00:00 6
2020-03-16 00:00:00+00:00 5
2020-03-17 00:00:00+00:00 11
2020-03-18 00:00:00+00:00 6
2020-03-19 00:00:00+00:00 23
2020-03-20 00:00:00+00:00 15
2020-03-21 00:00:00+00:00 15
Name: AnzahlFall, dtype: int64
>>> ts.cumsum()
Meldedatum
2020-03-07 00:00:00+00:00 2
2020-03-09 00:00:00+00:00 3
2020-03-10 00:00:00+00:00 4
2020-03-11 00:00:00+00:00 6
2020-03-12 00:00:00+00:00 11
2020-03-13 00:00:00+00:00 18
2020-03-14 00:00:00+00:00 24
2020-03-16 00:00:00+00:00 29
2020-03-17 00:00:00+00:00 40
2020-03-18 00:00:00+00:00 46
2020-03-19 00:00:00+00:00 69
2020-03-20 00:00:00+00:00 84
2020-03-21 00:00:00+00:00 99
Name: AnzahlFall, dtype: int64
>>> ts.cumsum().plot()
That gives your plot. However, the last data point is from 2020-03-21, as in the data you show above, even though the column 'Datenstand' says 2020-03-24. Huh? Tried it just now, they updated the data, now including 2020-03-24. Seems like numbers might be dropping slowly .. fingers crossed!
Thanks for reproducing this. I just pushed the code to do the same in R :-D I'll have a look at these wonderful repos tomorrow.
Indeed https://github.com/nevrome/covid19germany looks very nice. I'll stick to this one for now in order to obtain the RKI data.
Update: Custom query URLs seem to work correctly now. No need to download a 2 MB CSV file when the data we need is only 2 kB of json :)
# query only AnzahlFall,Meldedatum
url = "https://services7.arcgis.com/mOBPykOjAyBO2ZKk/arcgis/rest/services/RKI_COVID19/FeatureServer/0/query?where=UPPER(Landkreis)%20like%20'%25DRESDEN%25'&outFields=AnzahlFall,Meldedatum&outSR=4326&f=json"
d = requests.get(url).json()
df = pd.DataFrame([x['attributes'] for x in d['features']])
time_col = 'Meldedatum'
df[time_col] = pd.to_datetime(df[time_col], unit='ms')
ts = df.sort_values(time_col).groupby(time_col)['AnzahlFall'].sum()
ts_cs = ts.cumsum().resample('1d').ffill()
Thanksf for this code snippet. If you want, feel free to send a PR with script that takes the name of the Landkreis and dumps a csv. We could use this to validate the R code or vice versa.
I've been pointed to this webseite multiple times: https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data
I found that the numbers here don't match the numbers as described in #6 which I so far use by default. For example, going to the display by county https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0 Downloading the spreadsheet gives a csv file with 36 columns!
The metadata for this table is a bit lengthy and hard to digest: https://www.arcgis.com/sharing/rest/content/items/917fc37a709542548cc3be077a786c17/info/metadata/metadata.xml?format=default&output=html
Digging through this I could extract:
So today (Mar 23, 2020) this would be 99 cases which is the number between Mar 20
97
and Mar 21115
. So for me this gives:dresden.de
#6