Open JackKelly opened 2 years ago
Sounds really great, and good to get it planned out.
I would love to get https://github.com/openclimatefix/nowcasting_dataset/pull/562 into the new datasets, but I unfortunately Ive run out of time before my holiday
Detailed Description
A large part of my hope for the ML research we're doing in 2022 is to train across multiple "types" of prepared dataset. For example:
Context
To train our models to predict future satellite imagery, we probably want to use the entire geographical extent of the satellite imagery.
But we also want to predict PV in the UK, Italy and Malta.
so we might want each batch to contain a mix of examples: some examples will be from the UK (as is the case now), and some examples from anywhere in the geo extent of the satellite imagery (including over oceans) without any PV.
at the moment,
nowcasting_dataset
can't do this "mixture".The simplest way to do this might actually be to leave
nowcasting_dataset
mostly alone, and produce multiple different sets of batches (one set over the UK; the other set without PV data, and from the entire geo extent of the imagery). Thenpower_perceiver
will load multiple batches at once. This has the advantage that we can quickly experiment with dynamically changing the ratio of "UK" to "non-UK" imagery as training progresses.But this simpler approach still requires that we update nowcasting_dataset a bit (e.g. to randomly sample locations from the entire geo extent of the satellite imagery.)
Possible Implementation
Maybe implement a thin adaptor which holds multiple
power_perceiver.NowcastingDataset
instances, and itself inherits fromtorch.utils.data.Dataset
. This thin adaptor would sample randomly sample from the upstreampower_perceiver.NowcastingDataset
instances and stack theTensors
. So for example, if we're combining "just satellite" data and "satellite + PV + GSP + NWP" then, say, the first 16 examples in each batch would be "just satellite", and the first 16 examples for PV, GSP, and NWP would be zeros (and would be masked out before it goes into the Perceiver).