openclimatefix / power_perceiver

Machine learning experiments using the Perceiver IO model to forecast the electricity system (starting with solar)
MIT License
7 stars 1 forks source link

[ML Idea] Graph neural networks #34

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

This post isn't a well-formed idea. Rather, it's a collection of hand-wavy ideas.

The train-of-thought begins with a concern about using how expensive some forecasting models are: MetNet 2 needs 128 TPUv3 cores. And self-attention is also really expensive because self-attention considers every pixel. This is true even for the Perceiver IO architecture.

In sharp contrast, in the physical world, a "parcel" of air in the atmosphere can only interact with its neighbours. There aren't "teleconnections" across space or time.

If we knew the full state of each parcel of air then we could use a graph neural network, which is far more efficient, as recently demonstrated very convincingly by Ryan Keisler in his February 2022 paper "Forecasting Global Weather with Graph Neural Networks". Incidentally, Jacob is re-implementing Keisler's model in PyTorch. Keisler's model is tiny by modern standards and trains on a single A100.

But a single satellite image in no way represents the full state of each column of air represented by each pixel. For starters, a single image doesn't show direction of travel. That's easily fixed by using a pair of consecutive frames and computing the optical flow. A deeper problem is that there's a bunch of state that is simply invisible in single timestep of satellite data.

Consider two similar clouds moving towards the east. At t0, we observe that cloud 2 evaporates as it gets towards the eastern edge of the image. Our task is to predict that, at time = t+1, cloud 1 will also evaporate when it gets to the same location:

image

The problem is, of course, that at t0, we see no evidence that the eastern edge of the atmosphere is dry. (Well, the satellite's water vapour channels might tell us something. But let's ignore that for now and assume we're only using the visible channel!) Our model must remember - in its latent state - that the eastern edge is likely to be dry (which is inferred from the observation that cloud 2 evaporated when it got to the eastern edge).

So, there are at least two (related) reasons why we can't simply use Keisler's graph neural network, and swap out the NWP reanalysis dataset for satellite imagery:

  1. The full state of a parcel of air cannot be inferred from a single satellite image.
  2. So the latent state needs to be propagated over time.

So you could maybe imagine an adaptation of Keisler's graph neural network that looks something like this:

image

First we encode multiple satellite images (and NWPs). The encoding must maintain the spatial structure of the image (this is necessary so we can construct a graph where nearby nodes on the graph are nearby in physical space). And the encoding must capture state that's invisible in a single satellite image but can be inferred from a sequence of images (e.g. movement of the clouds, and to infer properties of the atmosphere by observing how clouds evolve).

Perhaps the encoder could be a ConvLSTM. Or a Perceiver IO. Or a Hierarchical Perceiver (see issue #14). Or a 3D CNN (convolving over space and time), perhaps with dilated convolutions as per MetNet 2.

But then, if we need an expensive encoder, then is there any point to using a GNN to roll the state forwards in time? GNNs are very elegant. But what matters is actual performance rather than aesthetics! Maybe we should just use a Hierarchical Perceiver end-to-end (as we're planning to do in #14). Transformers are graph neural networks, except they consider the full input as the "neighbourhood".

Perhaps a solution is to use a Hierarchical Perceiver, where each component Perceiver in the "decoder" only considers a small region of space, and so could be considered to be similar to a graph neural network which only considers the local neighbourhood:

image

I should probably try the architecture in the diagram above. But, if time allows, then it'd be great to also try a "proper" graph neural net decoder, fed by a Hierarchical Perceiver encoder. This is still interesting because the "Hierarchical Perceiver as GNN" just clumsily divides the image up into rectangles, whilst a "proper" GNN would more rigorously consider the "true" neighbours.

Further reading: