openclimatefix / pv-site-prediction

ML experiments and models for pv site forecasting
12 stars 4 forks source link

Photovoltaic (PV) Site Prediction Model

ease of contribution: hard

This repo contains code to train and evaluate a model to produce the forecasted energy production from solar panels (PV). It does this by providing a framework to forecast ahead, using pv data from sites, weather data (NWPs as multidimensional geospatial zarrs) and sateliite imagery (from the EUMETSAT Geostationary satellite).

Organisation of the repo

.
├── exp_reports         # Experiment reports - markdown notes about experiments we have made
├── exp_results         # Default output for the {train,eval}_model.py scripts
├── notebooks           # Diverse notebooks
├── data                # Placeholder for data files
└── dashboards          # Experimental streamlit dashboards
└── psp                 # Main python package
    ├── clients         # Client specific code
    ├── data_sources    # Data sources (PV, NWP, Satellite, etc.)
    ├── exp_configs     # Experimentation configs - a config defines the different options for
    │                   # training and evaluation models. This directory contains many ready
    │                   # configs where the paths points to the data on Leonardo.
    ├── models          # The machine learning code
    ├── scripts         # Scripts (entry points)
    └── tests           # Unit tests

Training and evaluating a model

poetry run python psp/scripts/train_model.py \
    --exp-config-name test_config1 \
    -n test

poetry run python psp/scripts/eval_model.py \
    -n test

# This will have generated a model and test results in `exp_results/test`.

# You can then look at the results in the `expriment_analysis.ipynb` and
# `sample_analysis.ipynb` notebooks by setting EXP_NAMES=["test"] in the first cells.

# Call the scripts with `--help` to see more options, in particular to run on more than one CPU.

# The script run_exp.sh can be used to train and then evaluate a model, for example
./run_exp.sh exp_config_to_use name_for_exp

Setting up the experiement configuration

The configuration and parameters for the specific model setup is done in a python file, with the file saved under psp/exp_configs. In this configuration file you can:

Set up data sources

Set up specific model configuration

PV inputs

This model forecasts the power produced by a specific solar site. If forecasting for 15 minute intervals it is best to use 15 minute data for training. To do this the you may want to resample the data. The associated timestamp which the generation represents sould be the middle of the window. More information on how the model resamples the PV data can be found in the training.py file.

Inputs to the model:

Training, Validating and Testing

Training, validation and testing can be split across different pv_ids for which the ratios can be specified in the make_pv_splits function in the experiment configuration. This is so that the model is trained off one of set of pv_ids and then validated and tested on an unseen set of pv_ids.

When training, these pv_ids will be outputed. The time range is the same for teh train and validation pv_id set.

When there is only a single or small number of sites, the argument, pv_split=None, should be passed into the config to avoid splitting up the pv_ids.

Inference and Backtests

When the eval_model.py script is run, different parameters can be passed to specify the conditions of the backtest. One option is to simulate a backtest without features generated from live PV data.

To do this a --no-live-pv flag can used when running the script, which will set the live PV features to NaNs. If this is used is important that the model has been trained off some NaNs from PV during training.

Prerequisites

Development

# Installation of the dependencies.
poetry install

# Formatting
make format

# Linting
make lint

# Running the tests.
make test

# Starting the jupyter notebooks.
make notebook