xcube-dev / xcube

xcube is a Python package for generating and exploiting data cubes powered by xarray, dask, and zarr.
https://xcube.readthedocs.io/
MIT License
198 stars 18 forks source link

URGENT CubesCombiner #457

Open maximlamare opened 3 years ago

maximlamare commented 3 years ago

Something has changed in EDC with an xcube function. I am updating the example Jupyter Notebook for the Africa cube contest, and have realise that it does not execute anymore.

For SPOT data, to get around the limitations of xcube not allowing to pass a list of dates rather than a range or not accounting for dates without data for BYOC collections, I built 2 cubes and merged them with the following command (found by trial and error):

from xcube.core.gen2.combiner import CubesCombiner

SPOT_t1 = open_cube(cube_config_t1, **sh_credentials)
SPOT_t2 = open_cube(cube_config_t2, **sh_credentials)

# Initialise merger with an existing cube
cc = CubesCombiner(SPOT_t1)

# Merge the two cubes
SPOT_cube = cc.process_cubes([SPOT_t1, SPOT_t2])

But now I get the following error:

ValueError: Chunks do not add up to shape. Got chunks=((1,), (591,), (350,)), shape=(2, 591, 350)

despite the cube being the same size in all dimensions.

How can I get around this quickly, in order to merge the two cubes? I put URGENT in the title because the contest is ongoing. Thanks in advance for your help.

forman commented 3 years ago

@maximlamare Hi Maxim, the CubesCombiner must be initialzed with a CubeConfig object, not an xarray dataset.

forman commented 3 years ago

Note also that the classes you use do not belong to public xcube API. Currently CubesCombiner ist not much more than xr.merge(cubes).

Can you please provide your full stack traceback here?

maximlamare commented 3 years ago

Thanks for the quick response. If I initialise the CubesCombiner with the CubeConfig object, I get the following full stack:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-9bdd37bde6f1> in <module>
      3 
      4 # Merge the two cubes
----> 5 SPOT_cube = cc.process_cubes([SPOT_t1, SPOT_t2])

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/combiner.py in process_cubes(self, cubes)
     51 
     52             # Force cube to have chunks compatible with Zarr.
---> 53             result_cube = self._rechunk_cube(result_cube)
     54 
     55             progress.worked(1)

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/combiner.py in _rechunk_cube(self, cube)
     61 
     62     def _rechunk_cube(self, cube: xr.Dataset):
---> 63         cube_rechunker = CubeRechunker(self._cube_config.chunks or {})
     64         return cube_rechunker.process_cube(cube)

AttributeError: 'CubeConfig' object has no attribute 'chunks'

And when I initialise it with the cube itself (now I know this is wrong, but I'm putting it here since it worked in the previous version, see EDC Marketplace, I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-fc753b1a16d4> in <module>
      3 
      4 # Merge the two cubes
----> 5 SPOT_cube = cc.process_cubes([SPOT_t1, SPOT_t2])

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/combiner.py in process_cubes(self, cubes)
     51 
     52             # Force cube to have chunks compatible with Zarr.
---> 53             result_cube = self._rechunk_cube(result_cube)
     54 
     55             progress.worked(1)

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/combiner.py in _rechunk_cube(self, cube)
     62     def _rechunk_cube(self, cube: xr.Dataset):
     63         cube_rechunker = CubeRechunker(self._cube_config.chunks or {})
---> 64         return cube_rechunker.process_cube(cube)

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/rechunker.py in process_cube(self, cube)
     46         # Data variables SHALL BE chunked according to dim sizes in dim_chunks
     47         chunked_cube = chunked_cube.assign(
---> 48             variables={var_name: var.chunk({var.dims[axis]: dim_chunks.get(var.dims[axis],
     49                                                                            _default_chunk_size(var.chunks, axis))
     50                                             for axis in range(var.ndim)})

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xcube/core/gen2/rechunker.py in <dictcomp>(.0)
     46         # Data variables SHALL BE chunked according to dim sizes in dim_chunks
     47         chunked_cube = chunked_cube.assign(
---> 48             variables={var_name: var.chunk({var.dims[axis]: dim_chunks.get(var.dims[axis],
     49                                                                            _default_chunk_size(var.chunks, axis))
     50                                             for axis in range(var.ndim)})

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xarray/core/dataarray.py in chunk(self, chunks, name_prefix, token, lock)
   1055             chunks = dict(zip(self.dims, chunks))
   1056 
-> 1057         ds = self._to_temp_dataset().chunk(
   1058             chunks, name_prefix=name_prefix, token=token, lock=lock
   1059         )

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xarray/core/dataset.py in chunk(self, chunks, name_prefix, token, lock)
   2035             )
   2036 
-> 2037         variables = {
   2038             k: _maybe_chunk(k, v, chunks, token, lock, name_prefix)
   2039             for k, v in self.variables.items()

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xarray/core/dataset.py in <dictcomp>(.0)
   2036 
   2037         variables = {
-> 2038             k: _maybe_chunk(k, v, chunks, token, lock, name_prefix)
   2039             for k, v in self.variables.items()
   2040         }

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xarray/core/dataset.py in _maybe_chunk(name, var, chunks, token, lock, name_prefix, overwrite_encoded_chunks)
    432         token2 = tokenize(name, token if token else var._data, chunks)
    433         name2 = f"{name_prefix}{name}-{token2}"
--> 434         var = var.chunk(chunks, name=name2, lock=lock)
    435 
    436         if overwrite_encoded_chunks and var.chunks is not None:

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/xarray/core/variable.py in chunk(self, chunks, name, lock)
   1058         data = self._data
   1059         if is_duck_dask_array(data):
-> 1060             data = data.rechunk(chunks)
   1061         else:
   1062             if isinstance(data, indexing.ExplicitlyIndexed):

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/dask/array/core.py in rechunk(self, chunks, threshold, block_size_limit, balance)
   2467         from . import rechunk  # avoid circular import
   2468 
-> 2469         return rechunk(self, chunks, threshold, block_size_limit, balance)
   2470 
   2471     @property

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/dask/array/rechunk.py in rechunk(x, chunks, threshold, block_size_limit, balance)
    251     if isinstance(chunks, (tuple, list)):
    252         chunks = tuple(lc if lc is not None else rc for lc, rc in zip(chunks, x.chunks))
--> 253     chunks = normalize_chunks(
    254         chunks, x.shape, limit=block_size_limit, dtype=x.dtype, previous_chunks=x.chunks
    255     )

/opt/conda/envs/eurodatacube-0.24.5/lib/python3.8/site-packages/dask/array/core.py in normalize_chunks(chunks, shape, limit, dtype, previous_chunks)
   2777             for c, s in zip(map(sum, chunks), shape)
   2778         ):
-> 2779             raise ValueError(
   2780                 "Chunks do not add up to shape. "
   2781                 "Got chunks=%s, shape=%s" % (chunks, shape)

ValueError: Chunks do not add up to shape. Got chunks=((1,), (591,), (350,)), shape=(2, 591, 350)