imperialCHEPI / healthgps

Global Health Policy Simulation model (Health-GPS)
https://imperialchepi.github.io/healthgps/
BSD 3-Clause "New" or "Revised" License
5 stars 1 forks source link

Allow users to specify data source in config file #465

Open alexdewar opened 2 days ago

alexdewar commented 2 days ago

This PR adds support for passing in the path/URL to the data source via the config file, instead of as a command-line argument. For now, users can still use the command-line argument but it is deprecated and, if they do so, a warning is emitted. Longer term, I want to drop this option altogether as using the config file is a cleaner and more reproducible approach.

There are various types of data source the user can pass in:

  1. A path to a data directory (as always)
  2. A path to a zip file
  3. A URL pointing to a zip file

If the user is supplying it via the config file, for the second two cases, we verify the data source against a checksum (not if it's a directory). For URLs, this means we can avoid downloading it altogether if the data is already in the cache. If they're doing it via the command-line argument, there's no checksum verification.

Given all these different permutations, the code is perhaps a little longer than I'd like, but I think it might be what's needed. We can drop some of it once we remove the option to supply the data source via the command line.

I haven't updated the docs yet, but will do so if people are happy with the changes.

Once this is merged, I plan to update the config files in the examples repo to use a URL as a data source.

Closes #407. Closes #365.