This PR adds support for passing in the path/URL to the data source via the config file, instead of as a command-line argument. For now, users can still use the command-line argument but it is deprecated and, if they do so, a warning is emitted. Longer term, I want to drop this option altogether as using the config file is a cleaner and more reproducible approach.
There are various types of data source the user can pass in:
A path to a data directory (as always)
A path to a zip file
A URL pointing to a zip file
If the user is supplying it via the config file, for the second two cases, we verify the data source against a checksum (not if it's a directory). For URLs, this means we can avoid downloading it altogether if the data is already in the cache. If they're doing it via the command-line argument, there's no checksum verification.
Given all these different permutations, the code is perhaps a little longer than I'd like, but I think it might be what's needed. We can drop some of it once we remove the option to supply the data source via the command line.
I haven't updated the docs yet, but will do so if people are happy with the changes.
Once this is merged, I plan to update the config files in the examples repo to use a URL as a data source.
This PR adds support for passing in the path/URL to the data source via the config file, instead of as a command-line argument. For now, users can still use the command-line argument but it is deprecated and, if they do so, a warning is emitted. Longer term, I want to drop this option altogether as using the config file is a cleaner and more reproducible approach.
There are various types of data source the user can pass in:
If the user is supplying it via the config file, for the second two cases, we verify the data source against a checksum (not if it's a directory). For URLs, this means we can avoid downloading it altogether if the data is already in the cache. If they're doing it via the command-line argument, there's no checksum verification.
Given all these different permutations, the code is perhaps a little longer than I'd like, but I think it might be what's needed. We can drop some of it once we remove the option to supply the data source via the command line.
I haven't updated the docs yet, but will do so if people are happy with the changes.
Once this is merged, I plan to update the config files in the examples repo to use a URL as a data source.
Closes #407. Closes #365.