nevrome / covid19germany

R package - Load, visualise and analyse daily updated data on the COVID-19 outbreak in Germany
46 stars 8 forks source link

Data download is unreliable and sometimes (!) yields incomplete data #32

Open arne1921KF opened 3 years ago

arne1921KF commented 3 years ago

Today (2020-01-11), timeseries data downloaded via usual get_RKI_timeseries() with standard parameter url =" delivers only some data from Hamburg, Schleswig-Holstein and Niedersachsen.

The page informs they are currently changing the DL options, and should currently be used.

The DL link there is currently hidden on the page behind the links/buttons.

nevrome commented 3 years ago

@stschiff already observed a similar issue last week. Has solved itself overnight. Maybe we have to switch to the alternative download option eventually, but for now I suggest to wait once more.

nevrome commented 3 years ago

So right now it seems to work again:

> rki_timeseries <- get_RKI_timeseries()
> unique(rki_timeseries$Bundesland)
 [1] "Brandenburg"            "Bayern"                
 [3] "Niedersachsen"          "Nordrhein-Westfalen"   
 [5] "Baden-Württemberg"      "Saarland"              
 [7] "Rheinland-Pfalz"        "Schleswig-Holstein"    
 [9] "Hessen"                 "Hamburg"               
[11] "Bremen"                 "Sachsen"               
[13] "Thüringen"              "Berlin"                
[15] "Mecklenburg-Vorpommern" "Sachsen-Anhalt" 
arne1921KF commented 3 years ago

....and gone again. Now they changed something in the data itself, it seems. I get parsing failures. Looks like the date columns changed. That breaks your code.

I hate it when data providers do this.

nevrome commented 3 years ago

Hm - can't confirm right now. Seems to work again.

But I get the feeling this download feature breaks multiple times a day. Maybe it's because the file grew to >55mb and the way we download it is just not suitable any more.

Maybe we should copy it automatically to an extra branch here on github once a day and point the default path of get_RKI_timeseries to our mirror.

arne1921KF commented 3 years ago

Aaaaand dead again. Only Schleswig-Holstein present in the timeseries. Has been like this at 5 am, when my bot tried to pull the current data. Is still the case at 9 am.

A git of the data would be rad. I seriously would like to know why the RKI isn't doing this themselves: just pushing the data to github, as soon as it is in. Like that, the dataset would even be transparent for monitoring changes directly using versioning.

nevrome commented 3 years ago

I merged #34 now to permanently enable the download from the alternative source. This seems to be more reliable.