numba / numba

NumPy aware dynamic Python compiler using LLVM
https://numba.pydata.org/
BSD 2-Clause "Simplified" License
10.02k stars 1.13k forks source link

numba ufunc without explicit signature vs. xarray #6678

Open crusaderky opened 3 years ago

crusaderky commented 3 years ago

numpy 1.20.0 numba 0.52.0 xarray 0.16.2 python 3.8

If I don't produce a list of signatures and rely on @vectorize'd functions to be compiled upon first call, they crash if the first call is a xarray.DataArray.

FYI @shoyer

import numba
import xarray

@numba.vectorize(nopython=True)
def double(x):
    return x * 2

a = xarray.DataArray(1)
double(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-396b55cc58ef> in <module>
      8     return x * 2
      9 
---> 10 double(xarray.DataArray(1))

~/miniconda3/lib/python3.8/site-packages/numba/np/ufunc/dufunc.py in _compile_for_args(self, *args, **kws)
    185         argtys = []
    186         for arg in args[:nin]:
--> 187             argty = typeof(arg)
    188             if isinstance(argty, types.Array):
    189                 argty = argty.dtype

~/miniconda3/lib/python3.8/site-packages/numba/core/typing/typeof.py in typeof(val, purpose)
     33         msg = _termcolor.errmsg(
     34             f"Cannot determine Numba type of {type(val)}")
---> 35         raise ValueError(msg)
     36     return ty
     37 

ValueError: Cannot determine Numba type of <class 'xarray.core.dataarray.DataArray'>

Workarounds

  1. Explicitly declare signatures
  2. Do a dummy call to a plain numpy array first
  3. a.copy(data=double(a.data))
stuartarchibald commented 3 years ago

Thanks for the report.

I think that this is working in the case of provided signatures because Numba is creating a real NumPy ufunc that will work with Xarray because it provides the protocols for that to work. In the case of no signatures, Numba is doing the "dynamic ufunc" behaviour (jitting ufuncs on demand based on arg type) and this is relying on type inference to work out what the argument types are and how to handle them. I think for now a workaround is going to be needed.

Reworking Numba's ufunc behaviour is something that's been discussed quite a bit recently, having this use case as another concrete example of something that could potentially "just work" is really helpful, thanks!

shoyer commented 3 years ago

My guess is that Numba needs some form of support for "duck typing" in order for dynamic ufuncs to work on xarray.DataArray objects. Assuming that dynamic ufuncs only need dtype and shape information matching the NumPy convention, this is something you can pull out from the corresponding attributes on xarray.DataArray objects.