Implement `PVPhysicsPredictionDataSource`

JackKelly commented 2 years ago

Detailed Description

For all timesteps, and for all PV systems in the region of interest, include:

Two sets of predicted PV power using pvlib's physical PV prediction. Use the PV system orientation metadata (if available).:
- Use NWP predictions (using an NWP init time at or before t0).
- Use clearsky
- ~~Maybe experiment with manually mapping from the inverter make and manufacturer in the metadata to pvlib's specifications.~~ UPDATE: I'm not sure the effort is worth payback.
~~The max actual PV power for each time of interest from the last 2 weeks.~~
- ~~Need to do some experimentation to check if 2 weeks is a good time. It might be better to find the max for a given sun angle.~~
- ~~This is useful for 2 reasons:~~
- ~~To create a "shading-aware" physics based PV forecast: min(pvlib_forecast(t), max_pv_power_for_last_2_weeks(t)).~~
- ~~actual_pv_power(t) / max_pv_power_for_last_2_weeks(t) should tell us what proportion of sunlight is being blocked by clouds.~~ UPDATE: I think the PV power production signal is too noisy to use simple approaches like this to model shading. Instead, I think we should train an ML model to handle shading.
The angle of the sun
The azimuth of the sun (unless the Sun data source already includes this information for each PV system). This data is useful for a simple ML model that takes the above inputs and estimates the residual of the pvlib's forecast for each PV system.
NWP variables for each PV system (maybe interpolated to 5 minutely)

Maybe use quite long history and forecast durations. Maybe 2 days of forecast and 2 days of history?

Also include:

The max actual PV power for the last 12 months (this is probably what we should use to rescale PV power to [0, 1]. Using the max across the entire timeseries won't capture panel degradation etc.)

Before building the data source, do some experiments in a Jupyter Notebook:

Try computing all of the above and see how well it performs as a PV forecast. If nothing else, this is all a useful baseline algorithm.
Try extending PVLib to consume UKV NWP.
Experiment with a simple (boosted regression tree?) model which predicts the residual.
can we reliably see shading from the last 2 weeks of data? What if the last two weeks was dull weather? Maybe better to compute a "shading plot" (a scatter plot of (actual PV power / expected PV power) vs solar angle) for multiple sun azimuth angles, and show this plot to an ML model. Or fit a curve to the shading plot.

Context

As discussed in https://github.com/openclimatefix/power_perceiver/issues/7, I'm now thinking of predicting PV as a chain of models, each of which predicts the residuals of the previous model.

JackKelly commented 2 years ago

I've made a start on this today, and found a few PV systems with pretty obvious shading issues (the plot shows power on the y axis and time-of-day on the x axis for June):

The next steps:

[ ] use pvlib to infer PV power from UKV NWP irradiance, air temperature and wind speed. See solarforecastarbiter.pvmodel.irradiance_to_power. See how well it aligns with actual PV power timeseries. See this comment from Will. And, most relevant, see this stack overflow question and answer.
[ ] infer azimuth and tilt from the PV power data and see if it agrees with the metadata! See pvanalytics.system.infer_orientation_fit_pvwatts. Also see this conversation with Will Holmgren on Twitter.
[ ] Maybe investigate pvanalytics.features.shading.fixed and see if it'd be easy to adapt it for using PV power data (the function wants to use GHI). Or is there a simple way to infer GHI from PV power? Or maybe just try giving the function PV power and see if it works! Maybe resample from 5 minutely to 1 minutely.

I'm increasingly convinced that we need to accurately model each individual PV system as accurately as possible (especially local shading and inverter clipping). For two main reasons:

When we train our ML model to infer PV power from satellite imagery, local effects like shading and inverter clipping are an unwelcome source of "noise".
Likewise when we use PV power data from nearby PV systems to help predict PV power at the PV system of interest.

JackKelly commented 2 years ago

Over the weekend, I decided that, actually, I should do some more experiments in power_perceiver before deciding whether to implement PVPhysicsPredictionDataSource in nowcasting_dataset.

Further notes will be continued in https://github.com/openclimatefix/power_perceiver/issues/10

openclimatefix / nowcasting_dataset

Implement `PVPhysicsPredictionDataSource` #615

Detailed Description

Context