[ML Ideal] Staged PV prediction, where each stage predicts the residual of the previous stage

Detailed Description

Could imagine predicting PV in several stages:

(initial experiments for model 1 and 2 will be tracked in issue #10)

A "base" physics-based model which predicts PV yield for a single PV system at a time using only a single "pixel" of NWP parameters at the location of the PV system.
- Create two "physics-based" predictions:
  - One based on NWPs (which is the prediction that subsequent steps will try to improve)
  - One based on clearsky irradiance (subsequent steps will compute actual_pv / physical_pv_prediction_using_clearsky_after_compensating_for_shading to get a feel for the variance in PV power attributable to "weather", rather than local factors such as shading).
- Some nice things about using a physics-based base model:
  - it should be able to handle rare-but-important-events like solar eclipses and snow-covered PV panels
  - it'll make use of the metadata
  - it means the downstream ML models don't have to re-learn the physics that we already have great models for.
    - ~~This will be implemented in https://github.com/openclimatefix/nowcasting_dataset/issues/615~~ UPDATE: Maybe, actually, while we're still figuring out if this will work: compute model 1 and model 2 directly from the intermediate PV and NWP data. (i.e. don't bother implementing a nowcasting_dataset DataSource until we're sure this is useful) Maybe reshape the NWP Zarr so each chunk is all timesteps for a single pixel and single channel?
An ML model which gets the same inputs (NWP, PV system metadata, the angle and azimuth of the Sun, etc.) as the base physics model and predicts the residuals of the base model.
- Maybe this ML model would also see several days of recent history of the PV system's power output, to help calibrate the predictions.
- This model will be run on the "NWP base model" and the "clearsky base model" to give us physical_pv_prediction_using_clearsky_after_compensating_for_shading (which will be used by model 3).
- This ML model would try to predict "local properties" of each PV system: especially shading & soiling:
  - ~~The simplest version would just be to clip the PV power forecast for each timestep at the max power for that time of day from the last few weeks.~~ UPDATE: After plotting the data, I'm sceptical that such a simple approach will work.
  - ~~A more sophisticated version might use an ML model which gets, as its input, a "shading plot" for that PV system.~~ UPDATE: This is kind of interesting, but I think that, instead, we should first try:
  - "Just" train an ML model to predict the residual of the "base" model, given the base model's output, and NWP params at that location, and angle and azimuth of the Sun at that location. Hopefully this model will learn the specific local features of this PV system. Several ways to do this:
    - Try a boosted regression tree. Operates on a single timestep at a time. Maybe also try giving it the last few days of data, to help it calibrate to recent changes (such as soiling of the PV panels). Or maybe just give the model data from the same time-of-day from the last few days, to minimise the size of the input.
    - Try a neural net. Try two steps: 1) predicts a single timestep of PV power, given no history. Run this across the entire history and the predictions of the future. 2) An RNN or TFT that, for the recent history, sees those predictions and the actual output, and uses that to calibrate the predictions.
    - Could try training a different ML model per PV system. But probably better to use a single ML model, which receives a rich embedding of the PV system ID (and maybe the geographical location?). If there's time, then try both approaches and compare them.
    - Try probabilistic output (mixture density network?). Could feed the params specifying the Gaussian mixture model into model 3.
An ML model which uses the last hour or two of PV power data from nearby PV systems. In particular, this model could see the recent history of PV power for many PV systems, and model 1's prediction for those systems over the recent history, and actual / physical_pv_prediction_using_clearsky_after_compensating_for_shading so the model can see the deviation attributable to "weather" for each PV system (which should be useful for predicting future PV). This model would predict PV power for all the PV systems in the region of interest, perhaps using a Perceiver IO.
- instead of feeding the output of the base model into the ML model, maybe use the base model to pre-train the ML model. If using one ml model per pv system, pretrain using a physics model with the same metadata params. If using one big ml model, then pre-train for each actual PV system, using the correct azimuth and angle of that PV system. But tell the model if the inputs are real or synthetic
An ML model which sees satellite imagery, and actual / physical_pv_prediction_using_clearsky_after_compensating_for_shading for the recent history for each PV system in the region of interest, and model 2's predictions. So the model sees when a dark cloud causes a large reduction in power compared to a model based on only clear sky irradiance, and hopefully can associate that reduction in PV power with a specific cloud. UPDATE: Actually, maybe start with model 2. And then add satellite imagery into model 2 (i.e. a single model would be responsible for calibrating the PV prediction using nearby PV systems and using satellite imagery).
Finally, an MLP which predicts total PV power for each GSP region from the PV systems within a given GSP.

There are lots of ways of slicing this up. For example:

the output model n only predicts the residual of the output of model n-1.
Model n receives the output of model n-1, and predicts pv power directly (not the residual).
Each component model runs independently and feeds a "meta model".
How many skip-connections exist (for example, maybe model 3 should also receive the full NWP "image").
Maybe pre-train each model independently. Maybe that's it. Or maybe fine-tune with all steps connected together. Or maybe don't pre-train; maybe train from scratch with all models connected together?

Cutting the model up explicitly into these pieces will also help us understand how much value each component brings.

Context

"Predicting residuals" and "combining physics-based models with ML models" have found a lot of success elsewhere.

openclimatefix / power_perceiver

[ML Ideal] Staged PV prediction, where each stage predicts the residual of the previous stage #7

Detailed Description

Context