nevrome / covid19germany

R package - Load, visualise and analyse daily updated data on the COVID-19 outbreak in Germany
Other
46 stars 8 forks source link

Feature Request: Age instead of RKI-Age-Groups? #49

Open Ohlsen opened 2 years ago

Ohlsen commented 2 years ago

Thanks for the great and easy to use package. I am currently looking for data on corona-cases (tested ill) in Bremen by Age (especially kids). Until now i only managed to find the RKI-Agegroups which i find too broad (especially the category 5-11 yrs, because 5 years is typically still kindergarden-age and 6+ is school-children).

Are cases per age (years) perhaps already available in the package? The Dashboard https://coronavis.dbvis.de/de/overview/dashboard/04011 has a heatmap titled "Positiv getestete nach Altersgruppen für Kreisfreie Stadt Bremen innerhalb 7 Tage pro 100k Einwohner " that gives the opportunity to change settings from RKI-agegroups to age by year. The data-source for that seems to be https://survstat.rki.de/.

nevrome commented 2 years ago

I fear we don't have this information in the broad datasets available via this package. I'm honestly surprised that age-by-year information is publicly available. I thought these details are omitted for privacy reasons.

But anyway: https://survstat.rki.de seems like a very interesting platform way beyond Corona. I'm sure it would be brilliant to have an R interface for that -- although I guess it could and should be way more general to make proper use of the API. How does that work? Is it even accessible from Unix operating systems?

A quick google search didn't give me any R package, but there's a python project by @rgieseke. Maybe he knows more and is willing to chime in?

rgieseke commented 2 years ago

Yes, i also thought about turning my scripts into a proper API, but it's a lot of work and probably needs someone who really understands or has access to the underlying data structure and thinking. There are quite a few tricky things like incidence rates using recent population numbers only if a year is selected etc.

Data for Bremen in 5-year intervals is available here: https://github.com/rgieseke/opencoviddata/blob/main/data/counties/survstat-covid19-cases-sk-bremen.csv There is also a file with cases per 100.000 per calendar week. (It's updated daily with a GitHub action.)

As for R, there is a Shiny project which i think has the respective code to fetch data from the SurvStat API, i believe: https://github.com/evolutionv2/shiny-webservice/blob/master/shiny-webservice/app.R

As for the original question, another way to get the data (without a nice API) could be to create the respective query and then post-process the script with R (something like this, depending on your query and question: https://gist.github.com/hoehleatsu/f8d08bc7ad04c0c144a11589f41ca921).

This Twitter thread has a walkthrough on how to fetch data from SurvStat: https://twitter.com/AscotBlack/status/1315678941659660288

Ohlsen commented 2 years ago

Thank you both so much for you kind and very insightful replies! I will have a look at your suggestions! Have a great weekend.

PS: I also just found this here on twitter https://twitter.com/BunterLotentony/status/1484477380018130951 which contains a link to a googlesheet with data per age https://docs.google.com/spreadsheets/d/e/2PACX-1vR9gYiVeUw7l7bIlOjkfkiyLlgwYmQTgEeS_0lXrBwyrWtN1W7ewvPa8JeflJVQmYiajgwFZvr_o3xq/pubhtml#