rs-delve / covid19_datasets

Interfacing several COVID-19 related datasets
MIT License
45 stars 19 forks source link

Added excess mortality to combined #12

Closed apaleyes closed 4 years ago

apaleyes commented 4 years ago

This PR is for discussion as much as for actual code review. Is The Economist data enough? Do we want it in this or any other format?

apaleyes commented 4 years ago

"as is" column makes sense, but it's a but tricky. so the economist dataset has "start date" and "end date", obviously start and end of the week the number is reported for. shall we associate it with "end" then in the combined?

as for the smoothed - i feel like this is a non-trivial processing. i'd be in favor of providing raw data as much as possible, and tricky things like rolling means can be computed with it later by dataset users. to that end daily average just seems simpler.

avishkar58 commented 4 years ago

Ah ok, yeah that makes sense! While I do agree with the principle of favouring raw data as much as possible - we do point out in the analysis notebook that there are known artifacts in the data like the difference in reporting due to days of the week, and we provide a potential solution to that in the form of the rolling smoothing. So I thought why don't we provide that smoothed data as part of the dataset instead of everyone having to compute it themselves. Agreed that it is non-trivial though and the smoothing we used is one out of many plausible solutions. Maybe we just confine that to the analysis notebooks then and provide a code snippet that people can re-use if they agree with our choice(s).

apaleyes commented 4 years ago

Yeah, something like that would be a good idea, and would also be a way to showcase this column's availability.

I'll work on adding raw weekly column to the combined dataset and updating this PR