Predict GSP-level PV power

Background: "GSP" means "grid supply point". National Grid divide Britain up into ~350 non-overlapping regions, and National Grid need PV forecasts for each of these regions. Also, National Grid will probably expect our GSP-level PV forecasts to be calibrated to Sheffield Solar's PV Live "ground truth" for PV generation at each GSP.

Several options:

1) ML predicts PV for each of the ~ 1 million PV systems in Britain. Then we'll sum spatially over each GSP region, and then calibrate to Sheffield Solar's PV Live, perhaps using a Temporal Fusion Transformer. We'll need the location & capacity for each of those million PV systems. Not entirely clear how to produce probabilistic GSP-level forecasts except Monte Carlo sampling from the PV-level probability distributions? Maybe we could deliver two forecasts: one that's calibrated to PV Live, and one that isn't?

2) ML directly predicts PV power for each GSP. Almost certainly still want to train on individual PV systems (as well as GSPs). Maybe it's as simple as including the GSPs a separate "PV system IDs" and feeding them into the "PV system ID" embedding. Or maybe have two embeddings (to make it really clear to the net that individual PV systems and GSPs are two separate concepts. Although a small GSP might not be much larger than a large, multi-field PV farm, so the concepts aren't that different). This is nice because it directly outputs exactly what we need (GSP-level PV, nicely calibrated to PV Live) and makes it trivial to produce probabilistic forecasts. But do the large GSPs fit within the ML's square of satellite imagery? (If not, maybe drop the spatial res and enlarge the spatial extent?)

3) predict PV yield over a regular (4km?) grid spacing over Britain. Then split those predictions per GSP. Feed into a simple fully connected net (one per GSP) which predicts GSP-level PV according to PV Live.

4) OCF predicts PV power for the ~20,000 PassivSystem PV systems that Sheffield Solar use as input to PV Live. Sheffield Solar use their PV Live algorithm to upscale from those 20,000 PV systems to the total output per GSP. As an added bonus, OCF could try to predict 5-minutely data for the ~20,000 PV systems which, actually, only report half-hourly data once-per-day.

Could provide a 'mask' to the net, which shows the spatial extent of the GSP (and shows a point for small PV systems; and geometry for larger PV farms??? Although few if any PV farms are larger than a pixel of satellite imagery)

To do this stuff, we need some more data:

[ ] Sheffield Solar's historical GSP-level PV Live: https://github.com/openclimatefix/nowcasting_dataset/issues/88
[ ] Recent satellite imagery and PV data: https://github.com/openclimatefix/nowcasting_dataset/issues/95

openclimatefix / predict_pv_yield

Predict GSP-level PV power #51