Closed noleti closed 4 years ago
Thanks a lot. This looks super useful and I agree we would work of the case-counts.json
directly. I just tried and ran into one problem though. There are duplicated entries in the json and a mix of strings and numbers. These are values for Germany.
{
"time": "2020-01-26",
"deaths": 0,
"cases": 0
},
{
"time": "2020-1-26",
"cases": "0",
"deaths": "0",
"recovered": "0"
},
{
"time": "2020-01-27",
"deaths": 0,
"cases": 0
},
Also not that the date format is not 100% consistent: it should be 'YYYY-MM-DD' which ensures sortability. I think this might the output of the cds.py
parser.
Otherwise it looks great and merges well with the fixes to the india parser I pushed.
can you point me to a duplicated entry? I thought I catch that
I should have been more precise. There are entries in the case-counts.json
that for Germany (just an example) with date 2020-1-26
and 2020-01-26
. And of them has data as string, the other as numbers.
What should be the default behaviour for empty values? None, or 0?
covid19_scenarios
expects something that evaluates to false. The current parser puts None
. I think None
is better than 0
because 0
could be interpreted as data. We can think about something like nan
or NA
, but for now None
/null
is good.
Ok, I think I fixed everything now. Can you have a look again?
This looks very good now. One quick thing I just noticed. In the ECDC parser, the cumulative counts should be assigned even when there is not update.
--- a/parsers/ecdc.py
+++ b/parsers/ecdc.py
@@ -69,10 +69,10 @@ def retrieve_case_data():
for d in data:
if d['cases']:
total_cases += d['cases']
- d['cases'] = total_cases
+ d['cases'] = total_cases
if d['deaths']:
total_deaths += d['deaths']
- d['deaths'] = total_deaths
+ d['deaths'] = total_deaths
ping me when you are done and I'll merge.
Done. I checked, and there are a lot of cases where deaths or cases are None, although total is >0. So I assume the dataset uses None if no case is reported, instead of 0
This all works as far as I can tell. Thanks so much and sorry about the fiddliness... I'll merge and test drive it a little more.
Great, thanks a lot for all your work. I think interactive site like the covid_scenarios are super helpful to understand the current issue and make informed decisions for the general population.
Re: https://github.com/neherlab/covid19_scenarios_data/issues/12 This set of patches changes existing parsers to also update a case-counts/case_counts.json. Parsers somewhat intelligently merge their data with existing data for a region or country (only add missing data for each day, never overwriting or removing). Order of parsers determines precedence. I also added a tsv.py parser to parse all .tsv files in the case-counts/ folder directly, and update the big .json with those values.
I still don't have the covid19_scenarios frontend running, so I could only do limited tests. This patch should not break anything, as old .tsv files are still produced. They are currently actually parsed again by tsv.py, but as we have a merge algorithm now this should not be an issue (runtime is wasted, but on my machine everything is fast enough).