openclimatefix / power_perceiver

Machine learning experiments using the Perceiver IO model to forecast the electricity system (starting with solar)
MIT License
7 stars 1 forks source link

Try Flamingo / contrastive learning approach to solar PV forecasting #155

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model

Motivations

Implementation:

Pre-training the video encoder:

Use the video encoder to predict PV:

To predict national PV (and demand?) use several encoders in parallel, in non-overlapping patches, to "see" the entire country? Feed these all into a single "future decoder". Perhaps need to use a hierarchical Perceiver.

Why not just directly predict future satellite imagery? Because I think it's too onerous to predict individual pixels, and we don't need individual pixels. We just need to extract a representation from the satellite that is informative of future PV.

bndxn commented 2 years ago

Hey Jack, I've been looking over this as part of my project. It's interesting to see this discussion while I'm trying to do something similar. Earlier you said:

I'm perhaps becoming sceptical of the value of predicting full satellite images

Can I ask what the motivation was for forecasting full satellite images in the first place, assuming the eventual goal is always to forecast PV?

JackKelly commented 2 years ago

Hi @bndxn! The main motivation for predicting satellite imagery was to allow us to pre-train the "satellite predictor" on any rectangle of satellite imagery, even locations with no PV systems (eg over the ocean!). This way, we hoped to get the model to learn cloud dynamics from as much data as possible. The idea is that predicting the movement of clouds is hard, so we want to train the model on as much data as possible.

I still very much believe in the general principle that we want to pre-train part of the ML model on as much satellite imagery as possible! But I now suspect that predicting pixel-by-pixel images is just too onorous. So it might be better to learn an encoder whose latent representation is maximally informative of the latent representation of future satellite imagery

bndxn commented 2 years ago

Thanks for getting back to me. Cool, that makes a lot of sense. I'm guessing when there's no PV data, the labels/target is a prediction of the same rectangle, is that right?

I can see how this approach would help generate good predictions of cloud patterns, and I can see how big cloudy patches would have a direct link to PV. Are there some nuances in clouds that this approach would help for, but that don't matter very much from a PV perspective? For example, maybe the satellite imagery gets very good at predicting lots of fine thin cloud vs thin ripply cloud (I am not a cloud expert!) but actually from a PV yield view these two are the same - does that make sense?

-- Ben

JackKelly commented 2 years ago

That's a good example!

My hunch is that the problem is that it's really hard to predict clouds on a pixel-by-pixel basis. So we're perhaps hurting the model by asking it to do something that's (almost) impossible! So we might be better off predicting high-level, abstract features of the satellite imagery, so we can still learn how clouds evolve from all the available satellite data.

bndxn commented 2 years ago

Cool, thanks this is really helpful! Is this something that can be captured well with a probabilistic forecast? In this case, does the chain go something like this:

  1. Input satellite images and other data
  2. Create an encoding of the cloud patterns
  3. Generate probabilistic forecasts of the image
  4. Use the probabilistic forecast to make a distribution of PV yield forecasts?
JackKelly commented 2 years ago

Yes, that looks right!

Although I'm proposing that we don't do Step 3. i.e. the model would never create an explicit pixel-wise prediction of satellite imagery. Instead the "satellite encoder" would be trained to produce latent representations that are maximally informative for future PV (and the future latent state of the satellite encoder) :slightly_smiling_face:

Please do shout if you're interested in trying any (or all!) or this!

JackKelly commented 2 years ago

PyTorch implementation of Contrastive Predictive Coding: https://github.com/rschwarz15/CPCV2-PyTorch

(Hat tip to @jacobbieker! Thanks Jacob!)

JackKelly commented 2 years ago

(Actually, my new favourite idea is described in openclimatefix/psuedo-pv-labeller#1! I plan to try openclimatefix/psuedo-pv-labeller#1 before I try contrastive learning. openclimatefix/psuedo-pv-labeller#1 feels like it has many of the same advantages, without some of the complexity of contrastive learning)

bndxn commented 2 years ago

Cool, that sounds really interesting. At the moment I'm working with a subset of the data and doing a ConvLSTM to predict PV directly going forwards.

My plan is to compare this with a simplified version of OCF's CNN in production (maybe only taking sat images and the solar altitude as inputs). The idea is to compare explicitly how important the explicit representation of time series in ConLSTMs is, compared to a 3D CNN.

If I can wrangle that in the next few weeks, then maybe I'll include a comparison with an attention model. If I finish all of that before September, then yeah sure! This will all provide helpful context anyway.

JackKelly commented 2 years ago

Sounds awesome! Please do let us know how it goes! Very exciting stuff!