pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
163 stars 25 forks source link

Array_plan.execute() not working (?) #101

Closed aladinor closed 2 years ago

aladinor commented 2 years ago

Hi all,

I am quite a newbie using Zarr files. Currently, I am trying to rechunk a Zarr group that contains multiple Zarr arrays (line [15] in the following jupyter notebook). I am trying to follow the tutorial posted on the rechunker website but apparently, something is missing or I am not understanding the goal of the rechunker library.

I have a folder that contains around 78 GB of data as is shown in the jupyter notebook in line [7]

https://nbviewer.jupyter.org/github/aladinor/camp2ex_proj/blob/master/notebooks/rewrite_zarr.ipynb

After following the source_group rechunker approach I ended up with an empty folder and subfolders as in line [17, 18, 19]. Does rechunker write the rechunked data in the target folder? or, does it just write a metadata file?

What I think is that the array_plan.execute()command just creates/continues using a delayed object not performing any writing task. Could this be possible? or basically, rechunker does not write any file?

Thanks in advance for your help!

aladinor commented 2 years ago

Hi all... I think I found the error... when I define the target_chunks dictionary (line 10 in the jupyter notebook) I have to define the actual/desired dimensions. I left -1 assuming it will get the whole dimension as in xarray datasets chunk. I changed these dimensions to the actual/desired values as follows:

target_chunks = {}
for i in source_group.array_keys():
    dims = {}
    dim = source_group[i].attrs.asdict()['_ARRAY_DIMENSIONS']
    for k in dim:
        if k == 'time':
            dims[k] = 2000
        elif k == 'range':
            dims[k] = 456
        elif k == 'cross_track':
            dims[k] = 25
        elif k == 'bin_size':
            dims[k] = 1
        elif k == 'vector':
            dims[k] = 3

    target_chunks[i] = dims

After making these changes it worked... I think I will close this issue... Thanks!