This repo contains code to train and evaluate a model to produce the forecasted energy production from solar panels (PV). It does this by providing a framework to forecast ahead, using pv data from sites, weather data (NWPs as multidimensional geospatial zarrs) and sateliite imagery (from the EUMETSAT Geostationary satellite).
.
├── exp_reports # Experiment reports - markdown notes about experiments we have made
├── exp_results # Default output for the {train,eval}_model.py scripts
├── notebooks # Diverse notebooks
├── data # Placeholder for data files
└── dashboards # Experimental streamlit dashboards
└── psp # Main python package
├── clients # Client specific code
├── data_sources # Data sources (PV, NWP, Satellite, etc.)
├── exp_configs # Experimentation configs - a config defines the different options for
│ # training and evaluation models. This directory contains many ready
│ # configs where the paths points to the data on Leonardo.
├── models # The machine learning code
├── scripts # Scripts (entry points)
└── tests # Unit tests
poetry run python psp/scripts/train_model.py \
--exp-config-name test_config1 \
-n test
poetry run python psp/scripts/eval_model.py \
-n test
# This will have generated a model and test results in `exp_results/test`.
# You can then look at the results in the `expriment_analysis.ipynb` and
# `sample_analysis.ipynb` notebooks by setting EXP_NAMES=["test"] in the first cells.
# Call the scripts with `--help` to see more options, in particular to run on more than one CPU.
# The script run_exp.sh can be used to train and then evaluate a model, for example
./run_exp.sh exp_config_to_use name_for_exp
The configuration and parameters for the specific model setup is done in a python file, with the file saved under psp/exp_configs
. In this configuration file you can:
duration
) and the number of horizons to forecast for (num_horizons
).This model forecasts the power produced by a specific solar site. If forecasting for 15 minute intervals it is best to use 15 minute data for training. To do this the you may want to resample the data. The associated timestamp which the generation represents sould be the middle of the window. More information on how the model resamples the PV data can be found in the training.py
file.
recent_history.py
model)recent_power_minutes
set in the recent_history class, where the average of data available within recent_power_minutes is used.num_days_history
can also be set to help calcualte the historical mean, medium and maximum at that time over the past number of days.Training, validation and testing can be split across different pv_ids for which the ratios can be specified in the make_pv_splits
function in the experiment configuration. This is so that the model is trained off one of set of pv_ids and then validated and tested on an unseen set of pv_ids.
When training, these pv_ids will be outputed. The time range is the same for teh train and validation pv_id set.
When there is only a single or small number of sites, the argument, pv_split=None
, should be passed into the config to avoid splitting up the pv_ids.
When the eval_model.py
script is run, different parameters can be passed to specify the conditions of the backtest. One option is to simulate a backtest without features generated from live PV data.
To do this a --no-live-pv
flag can used when running the script, which will set the live PV features to NaNs. If this is used is important that the model has been trained off some NaNs from PV during training.
# Installation of the dependencies.
poetry install
# Formatting
make format
# Linting
make lint
# Running the tests.
make test
# Starting the jupyter notebooks.
make notebook