nismod / smif

Simulation Modelling Integration Framework
http://www.itrc.org.uk
MIT License
22 stars 6 forks source link

I319 csv to binary #383

Closed tlestang closed 5 years ago

tlestang commented 5 years ago

Add smif command to automatically convert all data for a model run from one data format to another. See issue #319

$ smif prepare-convert --help
usage: smif prepare-convert [-h] [-v] [-i {local_csv,local_binary}]
                            [-d DIRECTORY]
                            model_run

positional arguments:
  model_run             Name of the model run

optional arguments:
 ...

The input format is set by the local interface. If -i is set to local_csv data will be converted from csv to the binary (Parquet) data format and the other way around.

The convert functionality is mainly implemented in the Store class, adding the following methods

Interventions and initial conditions data is typically scattered in different files. The current Store methods [read,write]_interventions() and [read,write]_initial_conditions() aggregate the content of the files and write it to a single file `model_name+'.csv'. In this way the input data file structure is lost

As a consequence, the two following methods have been added to the Store class to read/write a specific interventions or initial conditions file:

A major change to the downstream version is that data files are now referred to in config files with string ids instead of file paths. Example:

name: energy_demand
description: ''
path: models/energy_demand.py
...
interventions:
- energy_supply
- energy_supply_alt
...
parameters:
  - name: smart_meter_savings
    description: The savings from smart meters
    ...
    default: defaults
    dtype: float
    ...

The identifier energy_supply_alt will point to either energy_supply.csv or energy_supply.parquet depending on the nature of the FileDataStore (CSVDataStore or ParquetDataStore). The read/write methods in the FileDataStore have been modified to pad the key, which is now the string id of the data, with the relevant file extension contained in the ext attribute of the DataStore. For testing purposes, the MemoryDataStore has been given a ext=None attribute.

Additionally a bug in the definition of the the method _unnest_keys(interventions) (file_data_Store.py) has been fixed.

At the moment the upstream version contains methods and fixtures to read/write an index file linking string identifiers and interventions file path. These are relicate of a previous design in which the string identifier and the actual data file name are independant, at the cost of having to store the links in a index file. In my opinion, the gain in flexibility is not worth the added complexity and I would be ok to erase these methods.

codecov[bot] commented 5 years ago

Codecov Report

Merging #383 into develop will increase coverage by 0.08%. The diff coverage is 40.22%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #383      +/-   ##
===========================================
+ Coverage    71.27%   71.35%   +0.08%     
===========================================
  Files           60       60              
  Lines         5368     5597     +229     
  Branches       664      718      +54     
===========================================
+ Hits          3826     3994     +168     
- Misses        1446     1498      +52     
- Partials        96      105       +9
Flag Coverage Δ
#javascript 71.35% <40.22%> (+0.08%) :arrow_up:
#python 78.7% <40.22%> (-0.27%) :arrow_down:
Impacted Files Coverage Δ
src/smif/data_layer/memory_interface.py 86.57% <33.33%> (-2.24%) :arrow_down:
src/smif/data_layer/store.py 85.66% <35%> (-5.91%) :arrow_down:
src/smif/data_layer/file/file_data_store.py 86.53% <66.66%> (-0.55%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 40d3193...d24c1cb. Read the comment docs.

codecov[bot] commented 5 years ago

Codecov Report

Merging #383 into develop will increase coverage by 0.49%. The diff coverage is 89.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #383      +/-   ##
===========================================
+ Coverage    71.44%   71.94%   +0.49%     
===========================================
  Files           60       60              
  Lines         5400     5510     +110     
  Branches       669      693      +24     
===========================================
+ Hits          3858     3964     +106     
- Misses        1446     1447       +1     
- Partials        96       99       +3
Flag Coverage Δ
#javascript 71.94% <89.28%> (+0.49%) :arrow_up:
#python 79.51% <89.28%> (+0.4%) :arrow_up:
Impacted Files Coverage Δ
src/smif/data_layer/results.py 93.24% <100%> (ø) :arrow_up:
src/smif/metadata/coordinates.py 97.22% <100%> (ø) :arrow_up:
src/smif/data_layer/database_interface.py 16.58% <100%> (ø) :arrow_up:
src/smif/data_layer/memory_interface.py 88.69% <77.77%> (-0.12%) :arrow_down:
src/smif/data_layer/store.py 91.27% <86.27%> (-1.06%) :arrow_down:
src/smif/data_layer/file/file_data_store.py 89.51% <96.36%> (+2.43%) :arrow_up:
src/smif/data_layer/abstract_data_store.py 98.63% <97.82%> (-1.37%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update de6cf9f...f1ab4f1. Read the comment docs.

tomalrussell commented 5 years ago

Closes #319