Open JackKelly opened 3 years ago
Why not just let the sequences attend to the full input (and latent) like in the Informer implementation https://github.com/zhouhaoyi/Informer2020
My bad, this is trying directly this type of arch!
Could you add a line saying what the output is, please? "Output for timestep 0" is... Is it a single prediction for a fixed lookahead amount? Is it a whole timeseries of prediction from now until now+delta?
Sure! Here's a "zoomed out" diagram showing the encoder & decoder (but with just two timesteps of history (t-1 and t0) and two timesteps of predictions (t1 and t2). The idea is that it'll create predictions for every timesteps in a single forward pass. For example, if we were predicting 2 hours ahead at 5-minute intervals, the decoder would output 24 timesteps at once.
Some more details:
This model just predicts PV power for a single PV system at a time. Each timestep would specify a probability distribution over normalised PV power (PV power in kW divided by max(PV power)). I'm planning to use a mixture density network (I know MDNs aren't very popular, but I've found they work quite well :) ).
Each timestep gets a different byte array with M rows and C columns.
Perhaps each input "mode" (PV or satellite or air quality, etc.) needs to also be identified by a learned embedding, which would be concatenated onto each row. Or maybe that's not necessary. Or maybe we can just use a simple one-hot encoding!
Could be all-zeros; or learnt; or maybe the embedding of the solar PV system we're making the forecast for
See https://github.com/openclimatefix/predict_pv_yield/issues/68 for latest ideas about using Perceiver IO for solar PV nowcasting.
TBH, I'm slightly de-prioritising the idea of saving the separate latent arrays from each cross-attend. Instead, from the Perceiver IO paper, it feels like we're probably better off feeding in all our data as a big chunk. But I'll leave this issue open becuase I would definitely like to give this a go, if we get time. It's just probably not a priority.
Should be fairly simple to implement by just modifying the
for
loop at the bottom ofPerceiver.forward()
. Note thatx
is the latents.Here's a quick diagram. My additions are in black.
(I've removed the "weight sharing" from the diagram, but weight sharing would absolutely still be part of this)
The paper talks about using different timestep inputs. But I don't think the paper talks about using different outputs for each timestep. Maybe that's a bad idea :)
Related to: https://github.com/openclimatefix/predict_pv_yield/issues/35
Background
The ultimate aim is to predict solar electricity generation for a single solar system, over the next few hours, every five minutes. The inputs to the model will include 5-minutely satellite data, real-time power from thousands of solar PV systems, etc.
Whilst satellite imagery is probably great for telling us that "there's a big dark cloud coming in 25 minutes", satellite imagery probably doesn't tell us exactly how much sunlight will get through that big dark cloud.. so we need to blend satellite data with measurements from the ground. IMHO, this is where ML can really shine: Combing multiple data sources, many of which will be quite low quality data sources.
The input to the "Perceiver RNN" would include:
(I'm really excited about The Perceiver because our data inputs are "multi-modal", and The Perceiver works really well for multi-modal perception!)
So, maybe we'd actually have two "Perceiver RNNs" (i.e. weights would be shared within the encoder, and within the decoder. But the encoder and decoder would have different weights):
One problem with this is that information can only flow in one direction: forwards. So it might be interesting to add a self-attention block which functions over the time dimension:
Maybe the initial latent is the embedding for the solar PV system of interest.
At each input tinestep, the PV would be a concatenation of the power, the embedding of the PV system ID, and the geospatial location.