Closed tomalrussell closed 5 years ago
smif prepare-convert model_run_name
converts model run data from csv
to Parquet
or parquet
to csv
depending on the local interface. For now only tested in the former case.
model run data is composed of:
In most of cases, a file data_filename.parquet
is generated alongside the original data_filename.csv
, except for initial conditions and interventions data files (see below)
store.read_interventions()
reads all intervention files listed in the sector model config file and aggregate them into a single dictionary.
Therefore store.write_interventions()
only writes 1 single file instead of the two original data files.store.write_interventions()
generates a file named after the model name and appends the .csv
extension, i.e. model_name.csv
. The .csv
extension is replaced by the correct extension in the data store method.
So if model energy_demand
references 2 intervention data files energy_supply.csv
and energy_supply_alt.csv
then one gets 1 file energy_demand.parquet
store.write_initial_conditions()
that names the data file after the sector model name.
For instance reservoirs.csv
is converted in energy_demand.parquet
and water_supply_oxford.csv
is converted in water_supply.parquet
So after using the prepare-convert
command, one must modify the config files accordingly, which I think is not desirable. It may not be too difficult to name the parquet files after the original files, but probably involves messing with the file store and break things ?
Just realized that instead of converting all variants for scenarios and narratives, it is probably more consistent to convert only those used in the prescribed model run.
I think we should avoid merging the multiple interventions files, multiple initial conditions.
Use-cases for multiple files:
Possible approach:
I don't really understand what you mean by
change ConfigStore to use string ids for datasets instead of filepaths
Could you give an example ?
For now I opted for a very naive approach, implementing methods on Store/DataStore to read/write interventions for a specific file. The filename is still provided by the ConfigStore
model = store.read_model('energy_demand')
for intervention_file in model['interventions']:
interventions = csv_store.read_interventions_file(intervention_file)
parquet_store.write_interventions_file(intervention_file, interventions)
with for instance in the DataStore
def read_interventions_file(self, key):
path = os.path.join(self.data_folders['interventions'], key)
return self._read_list_of_dicts(path)
def write_interventions_file(self, key, interventions):
path = os.path.join(self.data_folders['interventions'], key)
path = self._set_file_extension(path)
self._write_list_of_dicts(path, interventions)
This looks like a simpler approach to get started - but do we still need to edit the config files eventually?
e.g. start with a model.yml
something like:
name: energy_model
interventions:
- energy_supply.csv
- energy_supply_alt.csv
then run prepare-convert, which picks up the two csv files and outputs two parquet files, energy_supply.parquet
and energy_supply_alt.parquet
then run the model using the parquet data store; the path to interventions files are still *.csv
, will it fail to read them? automagically try a different extension?
By "change ConfigStore to use string ids for datasets instead of filepaths" I meant adding a layer of indirection, so the config yamls don't know directly about any file paths for the data store.
change the model.yml
to something like:
name: energy_model
interventions:
- energy_supply
- energy_supply_alt
add a csv_store_files.csv
at the base of the data folder:
key,path
energy_supply,energy_supply.csv
energy_supply_alt,energy_supply_alt.csv
then when writing/converting to parquet, add a parquet_store_files.csv
(or .parquet
?!):
key,path
energy_supply,energy_supply.parquet
energy_supply_alt,energy_supply_alt.parquet
this means a breaking change to current configs, but would decouple the configs from the data store implementation
Closed by #383
smif prepare --from=csv --to=binary <model_run>
imports scenario and parameter data provided using the csv data store and writes out using the binary data storeThis will allow 'pure' implementations of the DataStore classes to retrieve data for a model run.