JackKelly commented 2 years ago

This is the first step of the "new ML research direction" outlined in issue #7.

TODO:

[x] find a PV system with shading issues
[ ] predict PV power using:
- [x] pvlib fed with clearsky irradiance.
- [ ] pvlib fed with UKV NWP: See this stack overflow question and answer. If that fails, also see: solarforecastarbiter.pvmodel.irradiance_to_power. See this comment from Will. Also see the Forecasting section of the pvlib docs.
  - [ ] maybe try inferring azimuth and angle using pvanalytics.system.infer_orientation_fit_pvwatts. Also see this conversation with Will Holmgren on Twitter.
- [ ] An ML model which sees a single timestep at a time:
  - [ ] (pvlib fed with UKV NWP) -> an ML model which "corrects" pvlib
  - [ ] an ML model which is pre-trained on pvlib fed with UKV NWP
  - [ ] an ML model on its own
- [ ] a simplified Temporal Fusion Transformer which predicts one PV system at a time, and receives recent history of PV power. Maybe this could, later, be shared on HuggingFace (issue #11). This is useful to see how "good" we can get PV prediction before we start using tricks like using neighbouring PV data and satellite imagery.
[ ] compare performance of the above

Context

I'm increasingly convinced that we need to accurately model each individual PV system as accurately as possible (especially local shading and inverter clipping). For two main reasons:

When we train our ML model to infer PV power from satellite imagery, local effects like shading and inverter clipping are an unwelcome source of "noise".
Likewise when we use PV power data from nearby PV systems to help predict PV power at the PV system of interest.

JackKelly commented 2 years ago

I've made a start on this on Friday, and found a few PV systems with pretty obvious shading issues (the plot shows power on the y axis and time-of-day on the x axis for June):

JackKelly commented 2 years ago

Here's pvlib prediction using clearsky irradiance (and default wind speed and air temperature):

Implemented in: https://github.com/openclimatefix/power_perceiver/blob/0f53822646da2605e4143268d6ac1be8cb4eabca/notebooks/2022-02-28_predict_pv_using_pvlib/predict_pv_using_pvlib.ipynb

Next step: Predict using NWPs

JackKelly commented 2 years ago

Actually, NWP irradiance isn't terrible: (this is using the NWP irradiance produced by the most recent NWP model run).

And showing cloud cover. And with data from 10 nearby PV systems in grey:

JackKelly commented 2 years ago

Hmm, I think one big source of error when using pvlib is likely to be the calculation of GHI, DNI and DHI.

The issue is that the NWP doesn't provide GHI, DNI and DHI. Instead it just provides "downwards short-wave radiation flux".

The pvlib docs provide a method to compute GHI, DNI and DHI purely from NWP total cloud cover (which already makes me nervous: we know that clouds at different vertical levels affect sunlight in different ways).

If we do this, then we get something that looks quite different to the NWP DSWRF:

If we use NWP DSWRF as GHI (which I'd guess is the right thing to do?! But I'm really not sure! Especially because none of the pvlib.forecast models do this) then we get this:

JackKelly commented 2 years ago

9408286

JackKelly commented 2 years ago

Next steps:

[x] Try predicting PV with DWSRF-as-GHI; vs GHI computed from cloud cover (directly above the point of interest).
[ ] Try computing cloud cover using the NWP pixels in the Sun's line-of-sight (see issue #13).
[ ] Try an ML model which sees the NWP pixels in the Sun's light-of-sight, plus a low res image of the pixels directly above the PV system (for light being scattered back) (see issue #13).

JackKelly commented 2 years ago

MAE for predicting PV power with PVLib using NWP directly above the PV system (ignoring times when the sun is less than 10 degrees above the horizon):

using DWSRF-as-GHI: 12.52% MAPE
using GHI computed from cloud cover: 12.68% MAPE

Both these values are closer to 7% MAPE if using the complete time series (including nighttime).

JackKelly commented 2 years ago

to convert the vector line which represents the line-of-sight of the Sun through the lower 10 km of the atmosphere, I'm planning to adapt Xiaolin Wu's line drawing algorithm.

Maybe something like this:

Find the integer index for the NWP "pixel" directly above the PV system.
Calculate the distance of the line of sight (distance = tan(90 - azimuth) * 10 km altitude / 2 km per pixel)
Use the line drawing algorithm to find the other nwp "pixels" needed. And use the anti-aliasing to compute a weighted mean. Implement the line drawing algo as a function that returns an array of points specified as x, y, intensity, in the order of the points from the start of the line. Then break the line into three segments. Compute the weighted mean of the low clouds for the first line segment, medium clouds for the middle segment, and high clouds for the last line segment

JackKelly commented 2 years ago

Give the ML model: solar azimuth, angle, and irradiance at the top of the atmosphere

JackKelly commented 2 years ago

Note to self: See the "next steps" in the ipython notebook.

JackKelly commented 2 years ago

tl:dr: Yes, tracing the clouds along the sun's path seems to help!

Experiments computing GHI, DNI and DHI using the clouds, selecting the NWP grid boxes in the Sun's path:

taking the max of each grid point for lcc, mcc, hcc. total_cloud_cover is the max of that. 11.37% MAPE (only using times when elevation > 10). 6.84% when using nighttime too.
taking the max of each grid point for lcc, mcc, hcc. total_cloud_cover is the sum of that, clipped at 1. 11.19% MAPE (only using times when elevation > 10). 6.73% when using nighttime too.
taking the max of each grid point for lcc, mcc, hcc. total_cloud_cover is the sum of that, clipped at 1. 11.19% MAPE (only using times when elevation > 10). 6.75% when using nighttime too.
taking the sum of each grid point for lcc, mcc, hcc and clipping at 1.. total_cloud_cover is the sum of that, clipped at 1. 11.79% MAPE (only using times when elevation > 10). 7.08% when using nighttime too.
taking the mean of each grid point for lcc, mcc, hcc.. total_cloud_cover is the sum of that, clipped at 1. 11.36% MAPE (only using times when elevation > 10). 6.85% when using nighttime too.

JackKelly commented 2 years ago

Next steps:

[ ] Speed up the code that gets NWP variables along the Sun's path. Load one timestep at a time from Zarr into memory, and then grab data from the data in memory.
[ ] Train a boosted regression tree to predict PV. Try giving it PVLib's PV predictions. Or try without. The aim here is to get the best PV prediction from a model that see a single timestep. And compare ML model vs ML model fed with PVLib vs PVLib on its own. Try giving it various different summary statistics of the NWP vars at each grid point (min, mean, max?).
[ ] Then try training a boosted regression tree just to learn local factors of each PV system (i.e. don't tell it anything about clouds). (Ultimately, we'll probably train a neural net to do this, and to output a distribution. This should be useful to help downstream ML network "see" the effects of clouds).
[ ] Then do the same with a deep neural net. Try training one PV system per neural net. Then try training one large neural net (giving it PV system ID) and see if it can still learn each PV system's characteristics.

openclimatefix / power_perceiver

Experiment with physical PV prediction using pvlib #10

Context