openclimatefix / uk-pv-backtest

1 stars 0 forks source link

contributors badge ease of contribution: medium

Backtest Formatting

This repo contains scripts and notebooks for formatting and verifying the backtest data produced by national solar forecasting models from Open Climate Fix.

Running PVNet backtests

Setting up the model and data configuration

National solar forecasting backtests can be run using OCFs PVNet and NationalXG models. In the PVNet repo, under scripts, there is the file gsp_run_backtest.py. This script can be used to run the backtests by setting the models and the dates ranges to use.

For PVNet there is one model to run the GSP level forecasts and another model, called the summation model, which is used to aggregate the GSP level forecasts to a national level. Each of these models checkpoints can be downloaded locally before running the backtest or can be streamed in from Hugging Face.

The model requires a specific configuration file called data_config.yml. This file defines:

The configuration file must exactly match the settings used during model training for the backtest to run correctly.

Executing the Backtest Process

As backtests can take a long time to run, it is best to used a environment like tmux to run the backtests. This allows you to keep the job running even if the SSH connection is lost.

After installing tmux you can create a new session with: tmux new -s [SESSION_NAME]

Then activate the appropriate conda environment to run the backtest. Once you have created and are inside a tmux session you can run the backtest with: python run_backtest.py

The progress of the backtest can be viewed by reconnecting to the tmux session with: tmux attach -n [SESSION_NAME]

It can be useful to inspect how much of the machines resources are being used via top or htop. This shows the CPU and RAM usage, which is useful when optimising the number of workers and batches.

Additional notes for running backtests.

Formatting of Forecasts produced by PVNet

For PVNet, processing and formatting scripts are found in /scripts/pvnet_prob/ and consists of the 4 steps below:

  1. Compile the raw PVNet files to a zarr file (compile_raw_files.py)
  2. Filter the data for GSP 0 (National) and the quantiles to output as a single csv (filter_zarr_to_csv.py)
  3. Merge and blend the Intraday and Day Ahead forecasts to produce a single csv (merge_and_blend.py)
  4. Add PVLive installed capacity and format the final forecast file (format_forecast.py)

Compiling Raw PVNet Files

PVNet produces a single netcdf file (.nc) per initialisation time. These files need to be combined together. The script to do this is called compile_raw_files.py. This will produce a zarr file containing the data.

The filter_zarr_to_csv.py script turns the data from a zarr into a csv, keeping just the national forecast rather than the GSP level forecasts. This needs to be performed for the Intraday and Dayahead forecasts separately.

Once the files are in the correct format, the merge_and_blend_prob.py script can be used. This merges the two datasets together and blends the forecasts together based on defined weightings at different forecast horizons in the script.

The data then needs to run through a last formatting script called, format_forecast.py. This script adds the PVLive installed capacity and outputs the final forecast file.

Additional notes on compiling forecasts

National XG formatting

Scripts have been written for interpolating hourly forecasts to half hourly interpolate_30min.py and for unnormalising forecasts using the installed capacity for PVLive unnorm_forecast.py.

Post Formatting

Notebooks for verifying the data and comparing forecasts is found in /notebooks/

Additional scripts

Check for missing data in the backtest using the missing_data.py file. This script checks the data for gaps in the forecasts and outputs a csv detailing the size and start of the gaps.

To name the file in the standardised format, use the rename_forecast_file.py script. For model version numbers, the pvnet_app version number is used.

Uploading Data to Google Storage

After running a backtest, the raw data can be uploaded to Google Storage. The gsutil command line tool can be used:

gsutil -m cp -r [LOCAL_FILE_PATH] gs://[BUCKET_NAME]/[OBJECT_NAME]

The -m flag enables parallel multi-threading, allowing multiple files to be transferred simultaneously which significantly speeds up the transfer.

Data can then be downloaded onto another machine for processing.

Contributing and community


Part of the Open Climate Fix community.

OCF Logo