fix: adjust getPopulationData.py to account for input file change

ekrell commented 4 years ago

Related issues and PRs

Description

Was unable to run python generate_data.py --output-population ../src/assets/data/population.json due to a name change of a hard-coded path in the script and a change made to the structure of an input json file.

Error for file path:

No such file or directory: '../src/assets/data/country_age_distribution.json'

Error for json structure change:

return country in [x['country'] for x in ages] TypeError: string indices must be integers

Impacted Areas in the application

Updating population data

Testing

python generate_data.py --output-population test.json

vercel[bot] commented 4 years ago

This pull request is being automatically deployed with Vercel (learn more). To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/covid19-scenarios/covid19-scenarios/3bw072qfw ✅ Preview: https://covid19-scenarios-git-fork-ekrell-master.covid19-scenarios.vercel.app

codeclimate[bot] commented 4 years ago

Code Climate has analyzed commit 9c4a066e and detected 0 issues on this pull request.

View more on Code Climate.

ivan-aksamentov commented 4 years ago

@ekrell Hi Evan, thanks!

We need to double check that. This may or may not be correct.

There is a bit of a story behind the population data:

Initially we downloadded the UN data and created country_age_distribution.json, but then our app data schema have changed and there was another script to convert one format to another: https://github.com/neherlab/covid19_scenarios/blob/master/data/scripts/transform_ages.py and then again: https://github.com/neherlab/covid19_scenarios/blob/master/data/scripts/transform_ages_v2.py

The reason for transforming and not re-downloading is that the download took more than a day I think. It's either UN servers are too slow or our script.

You never really need to re-run this script during development, because the population data pretty much never changes and the end result is commited to the repo. Also the original script would probably not generate the data in the correct format and it will fail the validation on app startup.

We are planning to split the data into smaller chunks, per region, loaded on demand (right now every user loads every single country, which is redundant and slow). See https://github.com/neherlab/covid19_scenarios/issues/743 . So that might be the time when we will fix our population script.

ivan-aksamentov commented 4 years ago

cc @rneher @nnoll

ekrell commented 4 years ago

@ivan-aksamentov

Sure, that all makes sense. And I agree 100% that you never need to re-run the script if you are using the program as intended. But, a small group of users that includes at least myself maintains their own local population data, parser, etc and has to re-run the script regularly. As for failing on validation-- yes, but I already have a hack to only load local counties of interest.

So I'm fine either way since I've been maintaining a "patch" script for a while now with various odd fixes and hacks, but I put this here in case someone is in my boat, comes across that error and needs a quick fix.

ivan-aksamentov commented 4 years ago

@ekrell Oh, I see. Thanks for digging into it. Will merge then.

If you want to add anything else, including the patches you are using for your project, don't hesitate to open another PR.

neherlab / covid19_scenarios