openclimatefix / power_perceiver

Machine learning experiments using the Perceiver IO model to forecast the electricity system (starting with solar)
MIT License
7 stars 1 forks source link

Only have one of each RawDataSource in RawDataset #78

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

Go back to using the same RawDatasource instance across all combos

Let's say we have:

And the temporal extents are:

sat: ----------------------------
pv:                   ------------
nwp:                 --------------------------------

For each data source that needs to load a subset:

Find all the combos that each Datasource that needs to load data is a member of. e.g.:

In a while loop, sample in a round-robin fashion from all the t0 periods for combo 1, then combo 2, then combo 3, then combo 1 again, etc.

For each datasource that needs to load, compute the union of these periods for the combos that data source is a member of. Also, separately, store these t0 periods for each combo for this epoch (for sampling from when making an example). Stop the while loop when any union is of the desired duration.

For each data source that needs to load, load the union of its periods into RAM.

The advantage of this is that, if the data sources have similar temporal extents, then you get as much diversity as possible per epoch, across all the data sources.