usc-isi-i2 / datamart-api

MIT License
1 stars 2 forks source link

Create Postgres dump of just the ~74 timeseries datasets required by Causeex #83

Open saggu opened 3 years ago

saggu commented 3 years ago

To do:

kyao commented 3 years ago

Need to check with them if indicator name should be the variable id

saggu commented 3 years ago

Things to do to achieve this ,

  1. Load all data from the full database backup.
  2. Delete metadata and data for variables which are not needed for CausX (@kyao knows these required variables)
  3. Create a sql back for the remaining data
  4. In a new branch, create a docker compose with this CausX database backup
zmbq commented 3 years ago

I do not have permission to view the spreadsheet for some reason, so I cannot write the script now

szeke commented 3 years ago

Is this resolved? @zmbq

zmbq commented 3 years ago

Uploaded to the CausX shared drive.

The development branch has a handy script in script/clean-for-causx.py, which accepts the excel spreadsheet as an argument, and cleans Datamart - deleting all the other variables. It also removes empty datasets. The script has a few handy arguments (please use --dry-run the first time you try it).

Running it on the full Datamart takes a couple of hours, since it deletes a lot of variables.