There's still lots of repeated code in the import commands which could be moved out into a common framework.
We should do some work to make all import commands have a common structure to make changes across all of them easier to do. They should all certainly be classes that inherit from the BaseImporter class, and exist outside of the management commands. Need to think a bit more about the structure of this.
Ideally we should probably have a single run_import management command that takes an argument that is the import to run. For bonus points possibly the import should also take an area type argument.
Not sure quite what this would looks like at the moment but it should also enable a more programatic approach to running all imports.
I can see two options for the latter:
a config file that lists all the available imports and how to run them
some form of for importer in BaseImporter.__subclasses__():
The former seems potentially better as for very standard imports it could all be in config with no need for extra code, although a quick skim of the code makes it looks like there's very few this would apply to.
A stretch goal would be for the config file to be a source for a run on deploy command that populates the database with the list of imports, and the import command pulls the list of imports from the database. The advantage of this is it'd be a reasonable first step in to a) allowing user imports and b) being able to re-import things without server access.
add a config JSON field to the dataset table to store most of the config as it's expandable and can just dump the existing config in their automatically. Might also need one for the datatype table?
use last updated, which will need adding, to compare with source files so only run imports that have new data. suspect this means that for most things there should be a standard method for getting the age of the files (can get the file location from the config, which also implies a standard key for the file) which can be overridden for more complicated things like remote datasets or ones which use multiple files.
have a cron job that dumps the relevant config needed to bootstrap all the imports so can then re-populate that if required. Possibly and auto update script can use this first to make sure all the datasets are in the DB?
maybe will want to use migrations to add new datasets in future if we're putting the config in the database
There's still lots of repeated code in the import commands which could be moved out into a common framework.
We should do some work to make all import commands have a common structure to make changes across all of them easier to do. They should all certainly be classes that inherit from the BaseImporter class, and exist outside of the management commands. Need to think a bit more about the structure of this.
Ideally we should probably have a single
run_import
management command that takes an argument that is the import to run. For bonus points possibly the import should also take an area type argument.Not sure quite what this would looks like at the moment but it should also enable a more programatic approach to running all imports.
I can see two options for the latter:
for importer in BaseImporter.__subclasses__():
The former seems potentially better as for very standard imports it could all be in config with no need for extra code, although a quick skim of the code makes it looks like there's very few this would apply to.
A stretch goal would be for the config file to be a source for a run on deploy command that populates the database with the list of imports, and the import command pulls the list of imports from the database. The advantage of this is it'd be a reasonable first step in to a) allowing user imports and b) being able to re-import things without server access.