openclimatefix / nwp-consumer

Microservice for consuming NWP data.
9 stars 3 forks source link

Rework caching system #121

Closed devsjc closed 4 months ago

devsjc commented 4 months ago

Errors were encountered running the consumer to process ICON data on a kubernetes cluster (see https://github.com/openclimatefix/dagster-dags/issues/51). These errors were due to linkage issues when copying data from the "temporary directory" to the local filesystem, even though they had the same parent path. For instance:

[Errno 18] Invalid cross-device link: '/tmp/nwpc/icon_global_icosahedral_single-level_2024021900_006_ALB_RAD.grib2' -> '/tmp/raw/2024/02/19/0000/icon_global_icosahedral_single-level_2024021900_006_ALB_RAD.grib2'

This points to a larger issue with the consumer, whereby at the moment when running in consumer mode, the raw data must be saved to a temporary directory and then moved to a local directory to continue to be used, even if it is not desired to be stored.

As such, rework the temporary directory system into a more qualified cache, which enables downloaded data to remain on the filesystem untouched for further processing in the case where the saving of the raw data is not desired:

old: download raw files to cache -> move cache to local -> move local back to cache -> convert to zarr -> move to store new: download raw files to cache -> convert to zarr -> move to store

The cache has to exist due to inflexible apis existing on a lot of the data storage locations, but that doesn't mean we can't use it more effectively.