openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

improve diff selection #319

Closed dfulu closed 4 months ago

dfulu commented 4 months ago

Previously, when there were acummulated variables which needed to be diffed, I was doing a compute only on the diffed variables. When the other non-diffed variables were not computed and computed later. This meant loading the same chunks of data twice.

This new pull request removes the compute step so that when compute is run on the output DataArray, each chunk i only loaded once. This also allows us to slice in time or space in either order without much penalty. In the old version slicing in time first would mean loading unnecessary data.

I did a unit speed test on this function locally and it is about twice as fast to loading data than the previous version