pangeo-data / rechunker

Disk-to-disk chunk transformation for chunked arrays.
https://rechunker.readthedocs.io/
MIT License
162 stars 25 forks source link

Tutorial notebook failures #104

Open jbusecke opened 2 years ago

jbusecke commented 2 years ago

I am working on the tutorial to add some docs for #93

I noticed several cells in the notebook that do not execute cleanly:

This line

future = array_plan.persist()
progress(future)

is failing with the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/7c/cchjc_ys3z5_33vyp640xycm0000gn/T/ipykernel_22645/3962758384.py in <module>
----> 1 future = array_plan.persist()
      2 progress(future)

AttributeError: 'Rechunked' object has no attribute 'persist'

Is this outdated functionality? Or am I making some other mistake.

This cell

target_chunks = {
    'air': {'time': 2920, 'lat': 25, 'lon': 1},
    'time': None, # don't rechunk this array
    'lon': None,
    'lat': None,
}
max_mem = '1MB'

target_store = 'group_rechunked.zarr'
temp_store = 'group_rechunked-tmp.zarr'

array_plan = rechunk(source_group, target_chunks, max_mem, target_store, temp_store=temp_store)
array_plan

could benefit from an explicit removal of the stores (which is done earlier in the tutorial) like this:

target_chunks = {
    'air': {'time': 2920, 'lat': 25, 'lon': 1},
    'time': None, # don't rechunk this array
    'lon': None,
    'lat': None,
}
max_mem = '1MB'

target_store = 'group_rechunked.zarr'
temp_store = 'group_rechunked-tmp.zarr'

# need to remove the existing stores or it won't work
!rm -rf group_rechunked.zarr group_rechunked-tmp.zarr

array_plan = rechunk(source_group, target_chunks, max_mem, target_store, temp_store=temp_store)
array_plan

I could fix the latter as part of #93 if that is ok?