gregorbj / Archive-VisionEval

VisionEval Model System and Framework (formerly RSPM Framework)
Apache License 2.0
28 stars 19 forks source link

Restructure to Enable Datastore References #40

Closed gregorbj closed 7 years ago

gregorbj commented 7 years ago

The system has been set up and tested with a simple concept of a datastore that has all the model results for all historical & forecast years. To avoid redundancy in rerunning a model for years that have already been run, the framework allows you to load an existing datastore and add to it. While this avoids having to rerun the model for a model year has already been run, it has some residual problems: 1) Data Redundancy: Some sub-models calculate future year values as a change from base year values. Therefore, the base year data has to be contained in the datastore for every scenario. 2) Complications in Adding Forecast Years: The framework performs a number of checks up front to assure that data every called module needs will exist in the datastore when the module needs it. This is done generally rather than on a year-by-year basis. Adding a forecast year disrupts the ability to use a general approach. In addition, it complicates the process of checking inputs to assure that input data are available for all years model run years. 3) Fixed Geography for all Forecast Years: The framework requires the geography to be specified in the "geo.csv" file in the "defs" directory. This geographic specification is used to set up tables in the datastore and to check the completeness of inputs. This approach requires the model geography to remain the same for all forecast years. This would be a problem in applications which require zones to be added or split in order to adequately describe a scenario.

This issue can be resolved by incorporating references to datastores into the framework so that a module can get data from more than one datastore. In the example #1 above, a base year model would be run creating a base year datastore. Other model runs for historic or future years can get data for the base year by using a reference to the datastore. This avoids data redundancy and enables the other listed issues to be addressed as well. The approach would also better support a typical use case for the models in which model development and application occurs in stages as follows: 1) The model is run and calibrated for the base year. 2) The model is run for other historic years to validate the model and develop historical reference measures (e.g. Oregon's GHG reduction goals are related to the estimated emissions in 1990) 3) A future 'reference case' model is developed and run for forecast years 4+) Alternative scenarios are developed and run for forecast years Changing the model to incorporate datastore references would enable the model data to be organized by stage while allowing data to be retrieved across stages. This can be done by including a table that relates years to datastore paths. For example, if a project directory structure is organized as follows:

MyBigProject ---History
---Base
---Other
---Future
---Reference
---Alt1
---Alt2
       ...

The references for the Alt2 model would look like the following (in JSON format) assuming that the base year is 2015, other historical years are 2000 and 2010, and the future year is 2035: "DatastorePaths": { "2000": "../../History/Other/datastore.h5", "2010": "../../History/Other/datastore.h5", "2015": "../../History/Base/datastore.h5" "2035": "datastore.h5" } The example shows paths relative to the Alt2 directory. Absolute and network paths could also be used. This could be helpful for cases where the base, other historical, and reference data might be shared for several modeling projects. This would enable sharing without copying the data.

gregorbj commented 7 years ago

The initial issue did not correctly display the example directory structure.

MyBigProject ---History
---Base
---Other
---Future
---Reference
---Alt1
---Alt2
         ...
gregorbj commented 7 years ago

Example directory structure still does not show up correctly. Here is a description. "MyBigProject" is the overall directory. It has two sub-directories: "History" and "Future". The "History" directory has two sub-directories: "Base" and "Other". The "Future" directory has a number of sub-directories including "Future", "Reference", "Alt1", and "Alt2".

gregorbj commented 7 years ago

The latest development version incorporates datastore references. The run_parameters.json can have an optional "DatastoreReferences" parameter which lists datastore references. This is an example:

"DatastoreReferences": { "Global": "../BaseYear/datastore.h5", "2015": "../BaseYear/datastore.h5" }

References may be made to "Global" and to model years (e.g. "2015"). The reference is the full path to a datastore relative to the working directory. A user could specify absolute paths, but that would make models less portable.