Running a big icenet_dataset_create to cache the tfrecords. The available data is up to 25/12/2023, so the end date is configured as such. In running the process scripts with that as the end date, an invalid SIC selection is happening:
Traceback (most recent call last):
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loaders/dask.py", line 408, in generate_sample
sample_output = var_ds.siconca_abs.sel(time=forecast_dts)
File "/home/USER/.conda/envs/icenet/lib/python3.9/site-packages/xarray/core/dataarray.py", line 1536, in sel
ds = self._to_temp_dataset().sel(
File "/home/USER/.conda/envs/icenet/lib/python3.9/site-packages/xarray/core/dataset.py", line 2573, in sel
query_results = map_index_queries(
File "/home/USER/.conda/envs/icenet/lib/python3.9/site-packages/xarray/core/indexing.py", line 188, in map_index_queries
results.append(index.sel(labels, **options))
File "/home/USER/.conda/envs/icenet/lib/python3.9/site-packages/xarray/core/indexes.py", line 489, in sel
raise KeyError(f"not all values found in index {coord_name!r}")
KeyError: "not all values found in index 'time'"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/.conda/envs/icenet/bin/icenet_dataset_create", line 33, in <module>
sys.exit(load_entry_point('icenet', 'console_scripts', 'icenet_dataset_create')())
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loader.py", line 126, in create
dl.generate()
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loaders/dask.py", line 78, in generate
self.client_generate(client,
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loaders/dask.py", line 218, in client_generate
in client.gather(futures):
File "/home/USER/.conda/envs/icenet/lib/python3.9/site-packages/distributed/client.py", line 2372, in gather
return self.sync(
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loaders/dask.py", line 340, in generate_and_write
x, y, sample_weights = generate_sample(date, var_ds, var_files,
File "/rds/user/USER/hpc-work/icenet/icenet/icenet/data/loaders/dask.py", line 414, in generate_sample
raise RuntimeError(sic_ex)
RuntimeError: "not all values found in index 'time'"
The location of this looks like it's in the ground truth select, meaning the generate_sample is maybe selecting dates past the range of the available training data. The icenet_process commands do not limit training date ranges based on number of days forecast, so we need to ensure the forecast window is correctly accounted for when creating samples.
This is likely only being observed as this training configuration is introducing data at the END of the full data window: the test and validation sets are pre-2023.
Description
Running a big
icenet_dataset_create
to cache the tfrecords. The available data is up to 25/12/2023, so the end date is configured as such. In running the process scripts with that as the end date, an invalid SIC selection is happening:The location of this looks like it's in the ground truth select, meaning the generate_sample is maybe selecting dates past the range of the available training data. The
icenet_process
commands do not limit training date ranges based on number of days forecast, so we need to ensure the forecast window is correctly accounted for when creating samples.This is likely only being observed as this training configuration is introducing data at the END of the full data window: the test and validation sets are pre-2023.