micb25 / RKI_Monkeypox

Web scraper and visualization for monkeypox cases in Germany
https://micb25.github.io/RKI_Monkeypox
GNU General Public License v3.0
6 stars 2 forks source link

BUG: Something seems wrong with the dates, Saturday rather than Monday #7

Closed corneliusroemer closed 2 years ago

corneliusroemer commented 2 years ago

Somehow the counts for today get attributed as two days ago rather than today, Monday

image

https://github.com/micb25/RKI_Monkeypox/blob/main/data/RKI_Monkeypox_processed.csv#L60

corneliusroemer commented 2 years ago

Why does the data stop end of June? Could this be due to switch to survstat from website scraper?

I think it would still be good to keep the website scraper for continuity reasons - otherwise there'll be disontinuity/discrepancy due to the sudden change of data collection.

corneliusroemer commented 2 years ago

There should be a scraper plot and a survstat plot - right now, scraper is best as it is continuous with the long history - later one can switch to survstat, but mixing the two isn't a good idea as it will cause a discontinuity

Thanks for the new plot types though, great you're visualizing SurvStat :)

corneliusroemer commented 2 years ago

See that bump followed by downward slope, it's an artefact due to switch to survstat

image
micb25 commented 2 years ago

I agree. I reactivated and fixed the old website scraper. Unfortunately, the publication of new data at the RKI is very inconsistent, e.g. the new cases have been reported on their website today, but RKI SurvStat still shows yesterday's data (I even checked manually). I guess, the RKI SurvStat data on the long run would be more accurate and it doesn't depend on a regular expression-based parsing of their website.

Why does the data stop end of June? Could this be due to switch to survstat from website scraper? There was an issue the website scraper (cases have been presented as '1.636' instead of '1636'). Fixed.

corneliusroemer commented 2 years ago

I agree - survstat is probably closer to the source and will be more maintained in the long run.

But for consistency reasons we should stick to surfacing scraper counts for now - until we have enough survstat data that the discontinuity is sufficiently in the past. I would maybe only start showing data from the point survstat was available? Unless one can create a survstat nowcast...