Closed noleti closed 4 years ago
@noleti thanks for involving me, I appreciate it and all the context you provide helps me get into the codebase. I think it's best if you do it. I won't be able to contribute as much during weekdays and I don't want to bite off too much now. I'll keep an eye out for your PR to see how you've done it. I bet I'll learn a lot from that.
This issue is to discuss next steps assuming that
case-counts
folder structure is simplified as suggested in https://github.com/neherlab/covid19_scenarios_data/issues/63, and direct .json generation is dropped do allow for manual verification/diffing of the .tsv data.Proposed signature of
store_data()
:Lets assume the main data structure passed by the parser is called
data
and has the format{'USA': [{'time': '2020-01-20', 'cases': 20,...},..,{'time': '2020-03-20', 'cases': 200,...}]}
or{'USA': [['2020-01-20', 20,...],..,['2020-03-20', 200,...]]}
I would like to propose the following:data
: country level is would just be name of country as found incountry_codes.csv
. State-level it needs to be the three-letter country code fromcountry_codes.csv
, a hyphen, and then the state name (e.g.,USA-New York
. We would need to update existing parsers to do so. ecdc and cds will not have to be updated. For others, the existingexceptions
dict should tell you which keys are country level, all others will be state-level and need to be prepended by the three letter country code.source
is the string identifying the parser, matchingsources.json
. It will also be used to name the folder incase-counts
for the .tsv filescols
will still be required to be able to parsedata
in case that is a dict of lists of lists (done in some parsers), and we don't want to rely on the parser passing data in the correct orderstore_data()
could then be simplified a lot. I would suggest:data
is a dict of lists of dicts. if yes, we convert to dict of list of lists usingdict_to_list(regions, default_cols)
. Either way, we then callstore_tsv()
. That function can also be simplified to get rid of the world.tsv and exception handling (would not be needed, as state-level keys would have appropriate names). I would still recommend sanitization of API-provided strings when using them for filenames. The files would then be saved toBASE_PATH/{source}/{country-or-state-name}.tsv
store_json()
andmerge_cases()
,compare_day()
can likely be reused for the later parsing of .tsv into json, so I would recommend not just deleting and forgetting about them.@tryggvigy tagging you as you explicitly said you would like to do this. Hope this helps. I can also do it, let me know if I should.