Errors were encountered running the consumer to process ICON data on a kubernetes cluster (see https://github.com/openclimatefix/dagster-dags/issues/51). These errors were due to linkage issues when copying data from the "temporary directory" to the local filesystem, even though they had the same parent path. For instance:
This points to a larger issue with the consumer, whereby at the moment when running in consumer mode, the raw data must be saved to a temporary directory and then moved to a local directory to continue to be used, even if it is not desired to be stored.
As such, rework the temporary directory system into a more qualified cache, which enables downloaded data to remain on the filesystem untouched for further processing in the case where the saving of the raw data is not desired:
old: download raw files to cache -> move cache to local -> move local back to cache -> convert to zarr -> move to store
new: download raw files to cache -> convert to zarr -> move to store
The cache has to exist due to inflexible apis existing on a lot of the data storage locations, but that doesn't mean we can't use it more effectively.
Errors were encountered running the consumer to process ICON data on a kubernetes cluster (see https://github.com/openclimatefix/dagster-dags/issues/51). These errors were due to linkage issues when copying data from the "temporary directory" to the local filesystem, even though they had the same parent path. For instance:
This points to a larger issue with the consumer, whereby at the moment when running in
consumer
mode, the raw data must be saved to a temporary directory and then moved to a local directory to continue to be used, even if it is not desired to be stored.As such, rework the
temporary directory
system into a more qualifiedcache
, which enables downloaded data to remain on the filesystem untouched for further processing in the case where the saving of the raw data is not desired:old: download raw files to cache -> move cache to local -> move local back to cache -> convert to zarr -> move to store new: download raw files to cache -> convert to zarr -> move to store
The cache has to exist due to inflexible apis existing on a lot of the data storage locations, but that doesn't mean we can't use it more effectively.