There might be a significant difference for selecting data based off labels, especially for large, multi-year datasets, compared to index-based selection, as the label indexing uses Pandas. This could add up during training, so could be worth making a lookup table or something to use index-based selection that is synchronized across all modalities.
@dfulu did a test on 2010 to 2023 satellite data doing index vs label slicing. Index took 0.9ms while label took 1.1ms, so probably not worth pursuing. Closing.
There might be a significant difference for selecting data based off labels, especially for large, multi-year datasets, compared to index-based selection, as the label indexing uses Pandas. This could add up during training, so could be worth making a lookup table or something to use index-based selection that is synchronized across all modalities.
Detailed Description
Context
Possible Implementation