pangeo-data / pangeo-cmip6-examples

Examples of analysis of CMIP6 data using xarray and dask
BSD 3-Clause "New" or "Revised" License
54 stars 23 forks source link

Xarray/Dask Exception in cmip6_precip_analysis #7

Open jhamman opened 5 years ago

jhamman commented 5 years ago

Xarray/dask are throwing a new error in this the cmip6_precip_analysis notebook.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-89aa342a55bc> in <module>
      3     da = da.chunk({'lat': 1, 'lon': None, 'time': None})
      4     return xr_histogram(da, bins, ['lon', 'time'], density=False)
----> 5 pr_3hr_hist = ds.pr.groupby('time.year').apply(func)
      6 pr_3hr_hist

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/groupby.py in apply(self, func, shortcut, args, **kwargs)
    572         applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs))
    573                    for arr in grouped)
--> 574         return self._combine(applied, shortcut=shortcut)
    575 
    576     def _combine(self, applied, restore_coord_dims=False, shortcut=False):

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/groupby.py in _combine(self, applied, restore_coord_dims, shortcut)
    576     def _combine(self, applied, restore_coord_dims=False, shortcut=False):
    577         """Recombine the applied objects like the original."""
--> 578         applied_example, applied = peek_at(applied)
    579         coord, dim, positions = self._infer_concat_args(applied_example)
    580         if shortcut:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/utils.py in peek_at(iterable)
    152     """
    153     gen = iter(iterable)
--> 154     peek = next(gen)
    155     return peek, itertools.chain([peek], gen)
    156 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/groupby.py in <genexpr>(.0)
    571             grouped = self._iter_grouped()
    572         applied = (maybe_wrap_array(arr, func(arr, *args, **kwargs))
--> 573                    for arr in grouped)
    574         return self._combine(applied, shortcut=shortcut)
    575 

<ipython-input-13-89aa342a55bc> in func(da)
      2 def func(da):
      3     da = da.chunk({'lat': 1, 'lon': None, 'time': None})
----> 4     return xr_histogram(da, bins, ['lon', 'time'], density=False)
      5 pr_3hr_hist = ds.pr.groupby('time.year').apply(func)
      6 pr_3hr_hist

<ipython-input-12-9c2fe48cd1a0> in xr_histogram(data, bins, dims, **kwargs)
     10                          output_dtypes=['f8'],
     11                          output_sizes={output_dim_name: len(bins_c)},
---> 12                          vectorize=True, dask='parallelized')
     13     res[output_dim_name] = output_dim_name, bins_c
     14     res[output_dim_name].attrs.update(data.attrs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args)
    967                                      join=join,
    968                                      exclude_dims=exclude_dims,
--> 969                                      keep_attrs=keep_attrs)
    970     elif any(isinstance(a, Variable) for a in args):
    971         return variables_vfunc(*args)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
    215 
    216     data_vars = [getattr(a, 'variable', a) for a in args]
--> 217     result_var = func(*data_vars)
    218 
    219     if signature.num_outputs > 1:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, *args)
    562             raise ValueError('unknown setting for dask array handling in '
    563                              'apply_ufunc: {}'.format(dask))
--> 564     result_data = func(*input_data)
    565 
    566     if signature.num_outputs == 1:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/computation.py in func(*arrays)
    556                 return _apply_blockwise(
    557                     numpy_func, arrays, input_dims, output_dims,
--> 558                     signature, output_dtypes, output_sizes)
    559         elif dask == 'allowed':
    560             pass

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/computation.py in _apply_blockwise(func, args, input_dims, output_dims, signature, output_dtypes, output_sizes)
    658 
    659     return blockwise(func, out_ind, *blockwise_args, dtype=dtype,
--> 660                      concatenate=True, new_axes=output_sizes)
    661 
    662 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs)
    231         from .utils import compute_meta
    232 
--> 233         meta = compute_meta(func, dtype, *args[::2], **kwargs)
    234     if meta is not None:
    235         return Array(graph, out, chunks, meta=meta)

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs)
    118         # with np.vectorize, such as dask.array.routines._isnonzero_vec().
    119         if isinstance(func, np.vectorize):
--> 120             meta = func(*args_meta)
    121         else:
    122             try:

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)
   2089             vargs.extend([kwargs[_n] for _n in names])
   2090 
-> 2091         return self._vectorize_call(func=func, args=vargs)
   2092 
   2093     def _get_ufunc_and_otypes(self, func, args):

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)
   2155         """Vectorized call to `func` over positional `args`."""
   2156         if self.signature is not None:
-> 2157             res = self._vectorize_call_with_signature(func, args)
   2158         elif not args:
   2159             res = func()

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args)
   2229                             for dims in output_core_dims
   2230                             for dim in dims):
-> 2231                 raise ValueError('cannot call `vectorize` with a signature '
   2232                                  'including new output dimensions on size 0 '
   2233                                  'inputs')

ValueError: cannot call `vectorize` with a signature including new output dimensions on size 0 inputs
rabernat commented 5 years ago

We need to update our packages. First step would be switching this to onbuild.

rabernat commented 5 years ago

I am trying to update the environment on the cmip6-examples binder.

I am experiencing problems with the onbuild mechanism. This is .binder/Dockerfile: https://github.com/pangeo-data/pangeo-cmip6-examples/blob/d9773fa8face088aba40d99b5feb3d672c6d3e6e/.binder/Dockerfile#L1

And this is .binder/environment.yml. https://github.com/pangeo-data/pangeo-cmip6-examples/blob/d9773fa8face088aba40d99b5feb3d672c6d3e6e/.binder/environment.yml#L1-L5

I really can't figure out what is wrong. Why is the environment.yml being being ignored?

This is what the repo2docker build log looks like:

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerv1_8iq_i'...
HEAD is now at 70b3c35 Update environment.yml
Using DockerBuildPack builder
Step 1/1 : FROM pangeo/pangeo-notebook-onbuild:2019.09.06
# Executing 6 build triggers
 ---> Running in 9a4ddd3df353
Removing intermediate container 9a4ddd3df353
 ---> Running in 4cac61dd2c43
Removing intermediate container 4cac61dd2c43
 ---> Running in caa541d99a1c
Removing intermediate container caa541d99a1c
 ---> Running in 570d9495acad
Removing intermediate container 570d9495acad
 ---> 67b8b79fabf8
{"aux": {"ID": "sha256:67b8b79fabf8412d6bef107341211f51e961ec98509e8f67fb478ba91e93a82e"}}[Warning] One ormore build-args [NB_USER NB_UID] were not consumed
Successfully built 67b8b79fabf8
Successfully tagged gcr.io/pangeo-181919/prod-pangeo-2ddata-2dpangeo-2dcmip6-2dexamples-5a27c2:70b3c353c242042d31e95d411ddba3de04829787
Pushing image
Pushing image
Pushing image
...
jhamman commented 5 years ago

@rabernat I wonder if the dot-binder directory isn’t supported by onbuild. I'm pretty sure this is the problem:

https://github.com/pangeo-data/pangeo-stacks/blob/ae634bf963bc07f043fa8899e84d124b32146ade/onbuild/r2d_overlay.py#L27-L30