Closed apaleyes closed 4 years ago
"as is" column makes sense, but it's a but tricky. so the economist dataset has "start date" and "end date", obviously start and end of the week the number is reported for. shall we associate it with "end" then in the combined?
as for the smoothed - i feel like this is a non-trivial processing. i'd be in favor of providing raw data as much as possible, and tricky things like rolling means can be computed with it later by dataset users. to that end daily average just seems simpler.
Ah ok, yeah that makes sense! While I do agree with the principle of favouring raw data as much as possible - we do point out in the analysis notebook that there are known artifacts in the data like the difference in reporting due to days of the week, and we provide a potential solution to that in the form of the rolling smoothing. So I thought why don't we provide that smoothed data as part of the dataset instead of everyone having to compute it themselves. Agreed that it is non-trivial though and the smoothing we used is one out of many plausible solutions. Maybe we just confine that to the analysis notebooks then and provide a code snippet that people can re-use if they agree with our choice(s).
Yeah, something like that would be a good idea, and would also be a way to showcase this column's availability.
I'll work on adding raw weekly column to the combined dataset and updating this PR
This PR is for discussion as much as for actual code review. Is The Economist data enough? Do we want it in this or any other format?