Rolling operations with numbagg produce invalid values after numpy.inf

Ockenfuss commented 7 months ago

What is your issue?

If an array contains np.inf and a rolling operation is applied, all values after this one are nan if numbagg is used. Take the following example:

import xarray as xr
import numpy as np
xr.set_options(use_numbagg=False)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
da.rolling(x=2).sum()

Output

<xarray.DataArray (x: 11)> Size: 88B
array([nan,  3.,  5., inf, inf,  9., 11., 13., 15., 17., 19.])
Dimensions without coordinates: x

With Numbagg:

xr.set_options(use_numbagg=True)
da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
print(da.rolling(x=2).sum())

Output

<xarray.DataArray (x: 11)> Size: 88B
array([nan,  3.,  5., inf, inf, nan, nan, nan, nan, nan, nan])
Dimensions without coordinates: x

What did I expect?

I expected no user-visible changes in the output values if numbagg is activated.

Maybe, this is not a bug, but expected behaviour for numbagg. The following warning was raised from the second call:

.../Local/virtual_environments/xarray_performance/lib/python3.10/site-packages/numbagg/decorators.py:247: RuntimeWarning: invalid value encountered in move_sum
  return gufunc(*arr, window, min_count, axis=axis, **kwargs)

If this is expected, I think it would be good to have a page in the documentation which lists the downsides and limitations of the various tool to accelerate xarray. From the current installation docs, I assumed I just need to install numbagg/bottleneck to make xarray faster without any changes in output values.

Environment

xarray==2024.2.0
numbagg==0.8.0

Package Versions

```txt anyio==4.3.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens==2.4.1 async-lru==2.0.4 attrs==23.2.0 Babel==2.14.0 beautifulsoup4==4.12.3 bleach==6.1.0 certifi==2024.2.2 cffi==1.16.0 charset-normalizer==3.3.2 comm==0.2.1 contourpy==1.2.0 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 exceptiongroup==1.2.0 executing==2.0.1 fastjsonschema==2.19.1 fonttools==4.49.0 fqdn==1.5.1 h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 idna==3.6 ipykernel==6.29.3 ipython==8.22.2 ipywidgets==8.1.2 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.3 json5==0.9.22 jsonpointer==2.4 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-lsp==2.2.4 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.13.0 jupyter_server_terminals==0.5.2 jupyterlab==4.1.4 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 kiwisolver==1.4.5 llvmlite==0.42.0 MarkupSafe==2.1.5 matplotlib==3.8.3 matplotlib-inline==0.1.6 mistune==3.0.2 nbclient==0.9.0 nbconvert==7.16.2 nbformat==5.9.2 nest-asyncio==1.6.0 notebook==7.1.1 notebook_shim==0.2.4 numba==0.59.0 numbagg==0.8.0 numpy==1.26.4 overrides==7.7.0 packaging==23.2 pandas==2.2.1 pandocfilters==1.5.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 platformdirs==4.2.0 prometheus_client==0.20.0 prompt-toolkit==3.0.43 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.17.2 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 pytz==2024.1 PyYAML==6.0.1 pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing==0.33.0 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.18.0 Send2Trash==1.8.2 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 terminado==0.18.0 tinycss2==1.2.1 tomli==2.0.1 tornado==6.4 traitlets==5.14.1 types-python-dateutil==2.8.19.20240106 typing_extensions==4.10.0 tzdata==2024.1 uri-template==1.3.0 urllib3==2.2.1 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xarray==2024.2.0 ```

max-sixty commented 7 months ago

Very much agree that we shouldn't have differences in output. We're quite well-tested for NaNs but admittedly not for inf. I'll look into fixing...

max-sixty commented 7 months ago

Though actually numbagg does the same as bottleneck. numbagg itself tests against bottleneck, including inf values.


          ...: import xarray as xr
          ...: import numpy as np
          ...: xr.set_options(use_numbagg=False, use_bottleneck=True)  # use bottleneck
          ...: da=xr.DataArray([1,2,3,np.inf,4,5,6,7,8,9,10], dims=['x'])
          ...: da.rolling(x=2).sum()

<xarray.DataArray (x: 11)> Size: 88B
array([nan,  3.,  5., inf, inf, nan, nan, nan, nan, nan, nan])  # nans propagate
Dimensions without coordinates: x

I think that's because it's quite difficult to have a fast algorithm that handles the case above — inf can't really go into the accumulator, because it can never come out of the accumulator (np.inf - np.inf is np.nan). I guess we could have a separate accumulator for inf values if this were really required...

So I think the resolution here is to indeed add a note to the docs:

should we say we generally have limited support for np.inf values? I think our testing is quite light on these.
or should we just say that bottleneck & numbagg have similar results, but are different to the naive native routine?

Ockenfuss commented 7 months ago

Thank you for the quick reply! The ideal solution for an end user like me would be if np.inf is supported, but if there are technical challenges, I would opt for the first suggestion (mentioning limited support for inf values). For an end user as I am, "bottleneck & numbagg have similar results, but are different to the naive native routine" is not too helpful. In the end, I just want to know if I can trust my results.

Actually, I think a page in the xarray docs named "Xarray Accelerators" or something would be useful, maybe in the advanced section. This page could clarify the following points for each accelerator (numbagg, bottleneck, maybe flox):

What? Brief introduction, what the accelerator does (e.g. algorithm optimizations, graphics card usage, ...)
Which operations might benefit? (e.g. groupby for flox, rolling for numbagg, ...)
What are the requirements? (e.g. graphics card available, ...)
What are the disadvantages? (e.g. no inf support)

I guess every one of those modules has some drawbacks? Otherwise, they could be installed as default dependencies. (Why would I choose a slow version, if I can have a faster one, even if I am not concerned about performance? :) )

max-sixty commented 7 months ago

I would opt for the first suggestion (mentioning limited support for inf values). For an end user as I am, "bottleneck & numbagg have similar results, but are different to the naive native routine" is not too helpful.

Hah, agree, thanks

In the end, I just want to know if I can trust my results.

👍

Great, I like that list.

Briefly — the main reason we don't have these as required dependencies is because we value a small number of dependencies. numbagg is also slow on the first run of a function, but I don't put much weight on that. (Or do others have different views? Possibly we get a better synthesis when someone tries to make one of these a required dependency... Don't think numbagg is quite there but one day...)

FWIW the inf support is a real corner case; though one we can add to the small list of tradeoffs. If we've found any inaccuracies in the past, we've jumped on them really quickly. We now have some initial fuzzing in numbagg to check random inputs match older routines.

Ockenfuss commented 7 months ago

I just found another, quite weird, example where I get different values. Is it just on my machine? Can you reproduce this?

xr.set_options(use_numbagg=False)
n1=9.9e+36
n2=7e+36
arr=[1,1,1,n1,1,n2,n1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
da=xr.DataArray(arr, dims=['x'])
da_roll=da.rolling(x=4, center=True, min_periods=1).mean()
print(da_roll)

gives me

<xarray.DataArray (x: 20)> Size: 160B
array([1.000e+00, 1.000e+00, 2.475e+36, 2.475e+36, 4.225e+36, 6.700e+36,
       4.225e+36, 4.225e+36, 2.475e+36, 1.000e+00, 1.000e+00, 1.000e+00,
       1.000e+00, 1.000e+00, 1.000e+00, 1.000e+00, 1.000e+00, 1.000e+00,
       1.000e+00, 1.000e+00])
Dimensions without coordinates: x

while, if I set use_numbagg=True, I obtain:

<xarray.DataArray (x: 20)> Size: 160B
array([1.00000000e+00, 1.00000000e+00, 2.47500000e+36, 2.47500000e+36,
       4.22500000e+36, 6.70000000e+36, 4.22500000e+36, 4.22500000e+36,
       2.47500000e+36, 2.95147905e+20, 2.95147905e+20, 2.95147905e+20,
       2.95147905e+20, 2.95147905e+20, 2.95147905e+20, 2.95147905e+20,
       2.95147905e+20, 2.95147905e+20, 2.95147905e+20, 3.93530540e+20])
Dimensions without coordinates: x

Is this some kind of numerical diffusion? It is similar to the inf problem in the sense that very large numbers compromise all results to the right, if rolling is applied.

max-sixty commented 7 months ago

Right, this is a numerical precision issue. bottleneck has the same problem, if that's installed:

[ins] In [1]: xr.set_options(use_numbagg=False)
         ...: n1=9.9e+36
         ...: n2=7e+36
         ...: arr=[1,1,1,n1,1,n2,n1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
         ...: da=xr.DataArray(arr, dims=['x'])
         ...: da_roll=da.rolling(x=4, center=True, min_periods=1).mean()
         ...: print(da_roll)
<xarray.DataArray (x: 20)> Size: 160B
array([1.00000000e+00, 1.00000000e+00, 2.47500000e+36, 2.47500000e+36,
       4.22500000e+36, 6.70000000e+36, 4.22500000e+36, 4.22500000e+36,
       2.47500000e+36, 2.95147905e+20, 2.95147905e+20, 2.95147905e+20,
       2.95147905e+20, 2.95147905e+20, 2.95147905e+20, 2.95147905e+20,
       2.95147905e+20, 2.95147905e+20, 2.95147905e+20, 3.93530540e+20])
Dimensions without coordinates: x

The naive routine does a full separate lookback calc for every value, which has the advantage that it's not possible to get any accumulation problems. But it's very slow — O(n*w)...

So unfortunately this is just the nature of floating point numbers...

Ockenfuss commented 7 months ago

Ok, I see, makes sense regarding the continuously accumulating implementation in numbagg. Thanks for the clarification!

pydata / xarray