Simple model: Just a transformer encoder that sees all the "byte array" and "query" concatenated together.
[x] try different activation functions (GELU, swish, see this discussion)
[x] First model: Concat 4 timesteps (t0, t-15mins, t-30mins, t-45mins) into the "patch" dimension. Check that it still performs well at inferring PV yield at t0. (maybe engineer it so we can randomly move around t0 a bit.
[x] predict multiple PV timesteps into the future in each "PV patch".
[x] use different queries for each timestep in the future
[x] add in recent history of PV (maybe each "PV element" is all timesteps of a single PV system. Each PV element would also double at the PV query).
[x] randomly move the start_idx around (being careful to move both the satellite imagery and the PV)
[x] predict satellite imagery. (The nice thing here is that the satellite input acts as the query for future satellite data.) Don't obsess too much about getting the output pretty.
[x] lower learning rate
[x] more heads
[ ] plot predictions of imagery.
[ ] Mixture density network
[ ] predict one timestep of imagery at a time, maybe with a separate set of query elements.
[ ] Smaller network.
[x] use two different ff nets to map from native query to d_model
[ ] or use random array to pad queries
[x] Residual over entire network
[ ] Weight near-term forecasts more
[ ] predict GSP PV (an additional query?)
[ ] Try swapping out the transformer encoders for Perceiver IOs?
[ ] Try using one "transformer encoder stack" per timestep and per satellite channel. Then merge the timesteps using Perceiver IO?
[ ] #43
[ ] try using optical flow? Or predict optical flow field? Or provide optical flow field as input?
[ ] Absolute spatial location
[ ] nan_to_num for the topographical data
[ ] #39
[ ] Try giving it all the PV systems in the batch (at least in the input).
[ ] Early stopping
[ ] Predict the delta of the pvlib prediction given clearsky (then given NWP air temp and wind speed) (although this might actually hurt performance on GSP predictions)
[ ] Different learning rates
[ ] NWP temperature and wind speed.
[x] Convert to a Python script
[ ] Use smallest possible vocabulary size for PV system embedding (get list of PV systems used in practice. Then map from PV row number to the index in that list. I think I wrote a script back in December to do this)
[ ] Use hydra for automated hyperparam sweeps on multiple GPUs
Simple model: Just a transformer encoder that sees all the "byte array" and "query" concatenated together.
t0
a bit.start_idx
around (being careful to move both the satellite imagery and the PV)nan_to_num
for the topographical datapvlib
prediction given clearsky (then given NWP air temp and wind speed) (although this might actually hurt performance on GSP predictions)