pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

Rechunk to an existing store #148

Open jbeezley opened 10 months ago

jbeezley commented 10 months ago

I have an existing data pipeline where I have data coming in incrementally. I have an existing pipeline performing a naive rechunking to a zarr store whenever new data comes into the source store. Rechunker has a much better algorithm I would like to use, but it doesn't have the ability to target an existing store.

This problem seems related to https://github.com/pangeo-data/rechunker/issues/8 however, for my use case a simpler implementation would be to optionally skip the call at https://github.com/pangeo-data/rechunker/blob/master/rechunker/api.py#L599 and open the dataset instead.

I would be willing to implement this via an optional kwarg, but I wanted to check if such a change would be accepted or if there are any issues with it that I'm not considering. Clearly, there could be problems if the dimensions/variables of the destination are not compatible. I could check that after opening or just let the exceptions from zarr pass through. Thoughts?