In `prepare_ml_data.py`, maybe don't allow `create_files_specifying_spatial_and_temporal_locations_of_each_example` to be run if a subset of `data_sources` is passed in at the command line

openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data

MIT License

24 stars 6 forks source link

The issue is that, if the files specifying the spatial and temporal locations of each example are computed with less DataSources than the number of DataSources used to create batches, then we're likely to attempt to sample from locations that don't exist in at least one datasource.

Maybe the mechanism should be:

By default, if prepare_ml_data.py is called with at least one --data_source command line argument, and if the files specifying locations don't exist, then throw an error. But allow users to overwrite this behaviour with a --force_creation_of_locations flag, or something like that??

openclimatefix / nowcasting_dataset

In `prepare_ml_data.py`, maybe don't allow `create_files_specifying_spatial_and_temporal_locations_of_each_example` to be run if a subset of `data_sources` is passed in at the command line #323