Add using Disk Cache Datapipe to save examples/batches to disk

To speed up training, it could be useful to write some examples or batches to disk to load quickly. There are built-in datapipes that can be used to cache outputs from datapipes, so we could do it for only certain parts or the end torch Tensors.

Detailed Description

OnDiskCacheHolder EndDiskCacheHolder

Context

We still have an issue with loading speed of Zarr-based data. This could be used in a few ways. One would be generate a dataset to initially train models on, and then switch to training from the full pipeline as more fine-tuning, even though its slower. Or mix in loading examples off disk and loading them from raw data.

openclimatefix / ocf_datapipes

Add using Disk Cache Datapipe to save examples/batches to disk #178

Detailed Description

Context

Possible Implementation