nismod / smif

Simulation Modelling Integration Framework
http://www.itrc.org.uk
MIT License
22 stars 6 forks source link

Add `smif prepare` command to cli to convert data to use binary filestore #319

Closed tomalrussell closed 5 years ago

tomalrussell commented 5 years ago

smif prepare --from=csv --to=binary <model_run> imports scenario and parameter data provided using the csv data store and writes out using the binary data store

This will allow 'pure' implementations of the DataStore classes to retrieve data for a model run.

tomalrussell commented 5 years ago

From wiki: https://github.com/nismod/smif/wiki/controller

tlestang commented 5 years ago

smif prepare-convert model_run_name converts model run data from csv to Parquet or parquet to csv depending on the local interface. For now only tested in the former case.

model run data is composed of:

In most of cases, a file data_filename.parquet is generated alongside the original data_filename.csv, except for initial conditions and interventions data files (see below)

So after using the prepare-convert command, one must modify the config files accordingly, which I think is not desirable. It may not be too difficult to name the parquet files after the original files, but probably involves messing with the file store and break things ?

tlestang commented 5 years ago

Just realized that instead of converting all variants for scenarios and narratives, it is probably more consistent to convert only those used in the prescribed model run.

tomalrussell commented 5 years ago

I think we should avoid merging the multiple interventions files, multiple initial conditions.

Use-cases for multiple files:

Possible approach:

tlestang commented 5 years ago

I don't really understand what you mean by

change ConfigStore to use string ids for datasets instead of filepaths

Could you give an example ?

For now I opted for a very naive approach, implementing methods on Store/DataStore to read/write interventions for a specific file. The filename is still provided by the ConfigStore

model = store.read_model('energy_demand')
for intervention_file in model['interventions']:
    interventions = csv_store.read_interventions_file(intervention_file)
    parquet_store.write_interventions_file(intervention_file, interventions)

with for instance in the DataStore

    def read_interventions_file(self, key):
        path = os.path.join(self.data_folders['interventions'], key)
        return self._read_list_of_dicts(path)

    def write_interventions_file(self, key, interventions):
        path = os.path.join(self.data_folders['interventions'], key)
        path = self._set_file_extension(path)
        self._write_list_of_dicts(path, interventions)
tomalrussell commented 5 years ago

This looks like a simpler approach to get started - but do we still need to edit the config files eventually?

e.g. start with a model.yml something like:

name: energy_model
interventions:
  - energy_supply.csv
  - energy_supply_alt.csv

then run prepare-convert, which picks up the two csv files and outputs two parquet files, energy_supply.parquet and energy_supply_alt.parquet

then run the model using the parquet data store; the path to interventions files are still *.csv, will it fail to read them? automagically try a different extension?


By "change ConfigStore to use string ids for datasets instead of filepaths" I meant adding a layer of indirection, so the config yamls don't know directly about any file paths for the data store.

change the model.yml to something like:

name: energy_model
interventions:
  - energy_supply
  - energy_supply_alt

add a csv_store_files.csv at the base of the data folder:

key,path
energy_supply,energy_supply.csv
energy_supply_alt,energy_supply_alt.csv

then when writing/converting to parquet, add a parquet_store_files.csv (or .parquet?!):

key,path
energy_supply,energy_supply.parquet
energy_supply_alt,energy_supply_alt.parquet

this means a breaking change to current configs, but would decouple the configs from the data store implementation

tomalrussell commented 5 years ago

Closed by #383