[x] When loading Satellite: round the timestamps to 5 mins (otherwise selecting time slices might break) (and need to re-implement this in nowcasting_dataset!)
[x] Implement a Location NamedTuple, with x and y. Use that instead of x_center_osgb etc.
[x] implement logic for loading n days of data off disk. Need to use intersection of contiguous periods (from all DataSources) when deciding which days to load. How to pass that from the dataset to the satellite DataSource?
At init, the dataset will ask each DataSource for its entire list of contig time periods.
At the start of each epoch, the dataset will, optionally, use its subset_contiguous_time_periods method, which will return, say, about 32 x 12 hours of periods (maybe loop round, picking (without replacement) a period, until the total duration is above the threshold).
The dataset then passes this subset_of_contiguous_time_periods into each DataSource.load_subset_into_ram method before each epoch.
Then, during training, the DataSet randomly samples t0 datetimes from the subset_of_contiguous_time_periods.
[x] implement time_periods_to_datetimes_per_combo which must loop round each data_source_combo name. Can re-use code from nowcasting_dataset.time.
[x] SpatialDataSource.get_osgb_location_for_example() (I think this may not need to be done in RawSatelliteDataSource. Or if at least part can be done in SpatialDataSource. NWPs have 1D OSGB coords. But we could share the logic of not going over the edge. And, actually, maybe the whole thing can be in SpatialDataSource, because I think selected_pixel = self._data_in_mem.isel({self._y_dim_name: y_idx, self._x_dim_name: x_idx})); location = Location(x=selected_pixel[self._x_dim_name], y=selected_pixel[self._y_dim_name]) might work?
[x] implement the new dataset
[x] Need to call DataSource.per_worker_init on each ds
[x] implement _get_example: handle xr_batch_processors, to_numpy, and np_batch_processors. See the PreparedDataset.
Location
NamedTuple, withx
andy
. Use that instead ofx_center_osgb
etc.subset_contiguous_time_periods
method, which will return, say, about 32 x 12 hours of periods (maybe loop round, picking (without replacement) a period, until the total duration is above the threshold).subset_of_contiguous_time_periods
into eachDataSource.load_subset_into_ram
method before each epoch.t0
datetimes from thesubset_of_contiguous_time_periods
.time_periods_to_datetimes_per_combo
which must loop round eachdata_source_combo
name. Can re-use code fromnowcasting_dataset.time
.RawSatelliteDataSource
:load_subset_into_ram
RawSatelliteDataSource.datetime_index
(and deselect nighttime)SpatialDataSource.get_osgb_location_for_example()
(I think this may not need to be done inRawSatelliteDataSource
. Or if at least part can be done inSpatialDataSource
. NWPs have 1D OSGB coords. But we could share the logic of not going over the edge. And, actually, maybe the whole thing can be inSpatialDataSource
, because I thinkselected_pixel = self._data_in_mem.isel({self._y_dim_name: y_idx, self._x_dim_name: x_idx})); location = Location(x=selected_pixel[self._x_dim_name], y=selected_pixel[self._y_dim_name])
might work?DataSource.per_worker_init
on each ds_get_example
: handlexr_batch_processors
,to_numpy
, andnp_batch_processors
. See thePreparedDataset
.subset_contiguous_time_periods
in dataset.