openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

Test how quickly using index-selection vs label-selection is for Zarr formats #179

Closed jacobbieker closed 1 year ago

jacobbieker commented 1 year ago

There might be a significant difference for selecting data based off labels, especially for large, multi-year datasets, compared to index-based selection, as the label indexing uses Pandas. This could add up during training, so could be worth making a lookup table or something to use index-based selection that is synchronized across all modalities.

Detailed Description

Context

Possible Implementation

jacobbieker commented 1 year ago

@dfulu did a test on 2010 to 2023 satellite data doing index vs label slicing. Index took 0.9ms while label took 1.1ms, so probably not worth pursuing. Closing.