neherlab / covid19_scenarios_data

Data preprocessing scripts and preprocessed data storage for COVID-19 Scenarios project
https://github.com/neherlab/covid19_scenarios
Other
41 stars 36 forks source link

Simplify store_data() arguments, requirements on data passed #67

Closed noleti closed 4 years ago

noleti commented 4 years ago

This issue is to discuss next steps assuming that case-counts folder structure is simplified as suggested in https://github.com/neherlab/covid19_scenarios_data/issues/63, and direct .json generation is dropped do allow for manual verification/diffing of the .tsv data.

Proposed signature of store_data():

def store_data(data, source, cols=[]):

Lets assume the main data structure passed by the parser is called data and has the format {'USA': [{'time': '2020-01-20', 'cases': 20,...},..,{'time': '2020-03-20', 'cases': 200,...}]} or {'USA': [['2020-01-20', 20,...],..,['2020-03-20', 200,...]]} I would like to propose the following:

store_data() could then be simplified a lot. I would suggest:

@tryggvigy tagging you as you explicitly said you would like to do this. Hope this helps. I can also do it, let me know if I should.

tryggvigy commented 4 years ago

@noleti thanks for involving me, I appreciate it and all the context you provide helps me get into the codebase. I think it's best if you do it. I won't be able to contribute as much during weekdays and I don't want to bite off too much now. I'll keep an eye out for your PR to see how you've done it. I bet I'll learn a lot from that.

noleti commented 4 years ago

resolved in https://github.com/neherlab/covid19_scenarios_data/pull/68