Describe the bug
Starting somewhere between 10 million and 50 million elements, the bn.nanmean and bn.nanstd functions appear to experience a catastrophic loss of accuracy with float32 data.
To Reproduce
This code creates float32 arrays of increasing size, and compares the results of the np and Bottleneck versions of nanmean and nanstd:
import numpy as np
import bottleneck as bn
print(f'{np.__version__=} {bn.__version__=}')
million = 10**6
for size in (million, 10*million,50*million, 100*million):
rand_data = np.random.random(size=size).astype(np.float32)
print(f"{size}")
print(" mean\t", np.nanmean(rand_data), bn.nanmean(rand_data))
print(" std\t", np.nanstd(rand_data), bn.nanstd(rand_data))
When I run it, I get:
np.__version__='1.24.0' bn.__version__='1.4.1'
1000000
mean 0.5003439 0.5003493428230286
std 0.28887847 0.28882330656051636
10000000
mean 0.49992886 0.49994951486587524
std 0.28866056 0.28725674748420715
50000000
mean 0.5000019 0.33554431796073914
std 0.28868446 0.30973944067955017
100000000
mean 0.4999724 0.16777215898036957
std 0.2886786 0.38657501339912415
Describe the bug Starting somewhere between 10 million and 50 million elements, the
bn.nanmean
andbn.nanstd
functions appear to experience a catastrophic loss of accuracy with float32 data.To Reproduce This code creates float32 arrays of increasing size, and compares the results of the np and Bottleneck versions of nanmean and nanstd:
When I run it, I get:
Versions:
Expected behavior I expected the differences between numpy and Bottleneck to be zero, or at least small relative to the size of the result.
Additional context I encountered this while trying to track down https://github.com/astropy/astropy/issues/17185 . https://github.com/astropy/astropy/issues/11492 may be related, but there the accuracy loss appeared smaller.