Open JackKelly opened 2 years ago
Hey Jack, I've been looking over this as part of my project. It's interesting to see this discussion while I'm trying to do something similar. Earlier you said:
I'm perhaps becoming sceptical of the value of predicting full satellite images
Can I ask what the motivation was for forecasting full satellite images in the first place, assuming the eventual goal is always to forecast PV?
Hi @bndxn! The main motivation for predicting satellite imagery was to allow us to pre-train the "satellite predictor" on any rectangle of satellite imagery, even locations with no PV systems (eg over the ocean!). This way, we hoped to get the model to learn cloud dynamics from as much data as possible. The idea is that predicting the movement of clouds is hard, so we want to train the model on as much data as possible.
I still very much believe in the general principle that we want to pre-train part of the ML model on as much satellite imagery as possible! But I now suspect that predicting pixel-by-pixel images is just too onorous. So it might be better to learn an encoder whose latent representation is maximally informative of the latent representation of future satellite imagery
Thanks for getting back to me. Cool, that makes a lot of sense. I'm guessing when there's no PV data, the labels/target is a prediction of the same rectangle, is that right?
I can see how this approach would help generate good predictions of cloud patterns, and I can see how big cloudy patches would have a direct link to PV. Are there some nuances in clouds that this approach would help for, but that don't matter very much from a PV perspective? For example, maybe the satellite imagery gets very good at predicting lots of fine thin cloud vs thin ripply cloud (I am not a cloud expert!) but actually from a PV yield view these two are the same - does that make sense?
-- Ben
That's a good example!
My hunch is that the problem is that it's really hard to predict clouds on a pixel-by-pixel basis. So we're perhaps hurting the model by asking it to do something that's (almost) impossible! So we might be better off predicting high-level, abstract features of the satellite imagery, so we can still learn how clouds evolve from all the available satellite data.
Cool, thanks this is really helpful! Is this something that can be captured well with a probabilistic forecast? In this case, does the chain go something like this:
Yes, that looks right!
Although I'm proposing that we don't do Step 3. i.e. the model would never create an explicit pixel-wise prediction of satellite imagery. Instead the "satellite encoder" would be trained to produce latent representations that are maximally informative for future PV (and the future latent state of the satellite encoder) :slightly_smiling_face:
Please do shout if you're interested in trying any (or all!) or this!
PyTorch implementation of Contrastive Predictive Coding: https://github.com/rschwarz15/CPCV2-PyTorch
(Hat tip to @jacobbieker! Thanks Jacob!)
(Actually, my new favourite idea is described in openclimatefix/psuedo-pv-labeller#1! I plan to try openclimatefix/psuedo-pv-labeller#1 before I try contrastive learning. openclimatefix/psuedo-pv-labeller#1 feels like it has many of the same advantages, without some of the complexity of contrastive learning)
Cool, that sounds really interesting. At the moment I'm working with a subset of the data and doing a ConvLSTM to predict PV directly going forwards.
My plan is to compare this with a simplified version of OCF's CNN in production (maybe only taking sat images and the solar altitude as inputs). The idea is to compare explicitly how important the explicit representation of time series in ConLSTMs is, compared to a 3D CNN.
If I can wrangle that in the next few weeks, then maybe I'll include a comparison with an attention model. If I finish all of that before September, then yeah sure! This will all provide helpful context anyway.
Sounds awesome! Please do let us know how it goes! Very exciting stuff!
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
Motivations
Implementation:
Pre-training the video encoder:
Use the video encoder to predict PV:
To predict national PV (and demand?) use several encoders in parallel, in non-overlapping patches, to "see" the entire country? Feed these all into a single "future decoder". Perhaps need to use a hierarchical Perceiver.
Why not just directly predict future satellite imagery? Because I think it's too onerous to predict individual pixels, and we don't need individual pixels. We just need to extract a representation from the satellite that is informative of future PV.