Pass through both models without any `for` loops

Tidy up the timings. Run the "satellite transformer" at 5-minute intervals (instead of 30-minute). Predict 2 hours into the future. Use 1 hour of history. (The tricky thing is that the 5-minutely data starts at random times past the hour). So, simplify by only using a single timestep of satellite imagery at a time. Which might hurt the performance of the "satellite transformer" model but maybe the "time transformer" will figure it out?

[x] Change the "satellite transformer" model to only use a single timestep of satellite imagery at a time. See how well that does.
- [ ] Also reduce d_model to 48 and see how well it does.
[x] Rename the models to "SatelliteTransformer" and "TimeTransformer".
[x] #55
[x] Get rid of all the nasty start_idx_5_min stuff and get_multi_timestep_prediction. Probably don't need a separate model to train the SatelliteTransformer, either.
[x] Reshape the batch so each timestep is seen as a different example.
[x] Pass this through the "satellite-to-PV-power" model.
[x] Reshape again to each timestep and each query is seen as a new input element to the "time transformer". Concatenate with:
- [x] The GSP queries and
- [x] The historical PV data. Use the full "history" of the PV (1 hour?).
- [ ] In 50% of the training batches, mask the historical PV data (but don't mask during validation!)
- [ ] As a separate experiment, try giving the "time transformer" historical data for every PV system. Also try giving every PV system to the "satellite transformer", although, to do that, we'll probably also have to use Perceiver IO for the "time transformer"
[x] Compute the MSE objectives over all the output timesteps.
[x] Only use the forecast timesteps for the NMAE metric.
[x] Update the timeseries plotting
[x] #56
[x] Increase d_model to 128?
[x] concatenate the actual PV power generation to the PV query just before it goes into the time_transformer instead of providing it as a separate set of input elements to the time_transformer.
[x] Compute total_nmae_loss and use that as the objective function
[x] add a marker to say "this element is historical PV". Just using zeros is a bad idea
[x] Back to MSE objective
[x] Try going back to just 6 timesteps during training. It looks like concatenating historical PV might have helped training?
[x] Try LR=1e-4 again

openclimatefix / power_perceiver

Pass through both models without any `for` loops #54