pydata / bottleneck

Fast NumPy array functions written in C
BSD 2-Clause "Simplified" License
1.07k stars 102 forks source link

[BUG] move_mean gives negative values for array with no negative values in it #332

Open toddrjen opened 4 years ago

toddrjen commented 4 years ago

Describe the bug When using move_mean on an array with no negative values, the result can somehow contain negative values anyway. This shouldn't be possible, since the mean of non-negative values can never be negative.

To Reproduce

Here is an example that reproduces the problem:

import numpy as np
import scipy as sp
import bottleneck as bn

siglen = 10000
np.random.seed(1)
noise = np.random.randn(siglen)
noise *= sp.signal.hann(siglen)
noise2 = noise**2

mmean = bn.move_mean(noise2, 2)
print(np.nanmin(mmean))

The result is -3.306294042965365e-15, which is small but nevertheless negative.

This is being done on Python 3.7.3 with openSUSE. It uses openSUSE_compiled packages. This is with bottleneck 1.3.1 and numpy 1.17.4.

Expected behavior There should be no negative values in the moving mean of an array with no negative values.

Additional context Originally discovered with xarray and reported there (pydata/xarray#3855). This was determined to be a bottleneck issue, as the example shows.

This is causing trouble because I am trying to calculate the root mean square value, and negative numbers are causing NaN values when doing the square root.

qwhelan commented 4 years ago

@toddrjen Thanks for opening this issue, I can reproduce locally.

The current implementations are less numerically stable than is desired, so I believe this is consistent with other numerical issues currently in bottleneck. I'm currently focusing on revamping the reduce summation functions to utilize pairwise summation, and similar changes would likely be needed for move_mean().

I'll investigate this weekend whether there's a triage opportunity for this issue, but I suspect it requires a larger fix that is in progress.