Scalability of the uncertainty propagation in numpy arrays

rth commented 7 years ago

Statement of the problem

Currently uncertainties supports numpy arrays by stacking uncertainties.ufloat objects inside a numpy.array( , dtype="object") array. This is certainty nice as it allows to automatically use uncertainty propagation with all existing numpy.arrays operators. However this also poses significant performance limitations, as the logic of error propagation is needlessly repeatedly calculated for every element of the array.

Here is a quick benchmark for the current implementation (v0.3),

%load_ext memory_profiler

import uncertainties as un
from uncertainties import unumpy as unp
import numpy as np

N = 100000
x = np.random.rand(N)
x_err = np.random.rand(N)

ux = unp.uarray(x, x_err)

print('== Numpy ==')
%timeit x**2
%memit x**2
print('== Unumpy ==')
%timeit ux**2
%memit ux**2

# == Numpy ==
# The slowest run took 5.22 times longer than the fastest. This could mean that an intermediate result is being cached 
#10000 loops, best of 3: 57.7 µs per loop
# peak memory: 98.48 MiB, increment: 0.29 MiB
# == Unumpy ==
#1 loops, best of 3: 816 ms per loop
# peak memory: 132.66 MiB, increment: 29.13 MiB

N = 100
x = np.random.rand(N,N)
x_err = np.random.rand(N,N)

ux = unp.uarray(x, x_err)

print('== Numpy ==')
%timeit x.dot(x)
%memit x.dot(x)
print('== Unumpy == ')
%timeit ux.dot(x)
%memit ux.dot(x)

# == Numpy ==
#10000 loops, best of 3: 88.1 µs per loop
# peak memory: 78.93 MiB, increment: 0.07 MiB
# == Unumpy == 
#1 loops, best of 3: 14.7 s per loop
# peak memory: 543.95 MiB, increment: 435.94 MiB

while some decrease in performance is of course expected for the error propagation, currently for simple array operations the performance is as follows,

for squaring a 100000 long vector: unumpy is 1400x slower than numpy and takes 100x more memory
for a (100, 100) matrix multiplication: unumpy is 170000x slower than numpy and takes 6200x times more memory. Which means e.g. that I'm running out of memory when trying to do a (1000, 1000) matrix multiplication.
Proposed solution

I believe that making the unumpy.uarray return a custom undaray object (and not a np.array(.., dtype='object')) which would store the mean value and standard deviation in 2 numpy arrays e.g. undarray.n and undarray.s, then implementing the logic of error propagation at the array level would address both the memory usage and performance issues.

The way masked arrays , also constructed from 2 numpy arrays, are implemented as a class inheriting from ndarray, could be used as an example.

Possible impact

This also goes along with Issue #47 , as if operators are defined as methods of this undarray object with a numpy compatible API, the existing code with numpy operators (e.g. np.sum( undarray )) might just work out of the box [needs confirmation].

Might affect issue https://github.com/lebigot/uncertainties/issues/53.

In order to keep backward compatibility, it could be possible to keep the current behavior with

 arr = numpy.array([ufloat(1, 0.1), ufloat(2, 0.002)])
 arr = unumpy.uarray( np.array([1, 1]), np.array([0.1, 0.002]))  # assumes dtype='object'

then switch to this new backend with

 arr = unumpy.uarray( np.array([1, 1]), np.array([0.1, 0.002]), dtype='uarray')

This would require significant work to make all the operators work, and at first only a subset of operators may be supported, but I believe that performance improvement for unumpy would help a lot for this package to be used in any medium scale or production applications.

I would be happy to work on this. What do you think @lebigot ?

lebigot commented 7 years ago

I did not have time to read your whole post yet, but the idea of having a dedicated array type for optimization purposes deserves attention, as does the idea of storing separately the nominal value and the random part (not the standard deviation, which is not stored internally). This is definitely the way of obtaining faster NumPy calculations.

It'd be great if you work on this. But please call the new array class uarray, for consistency with the existing uncertainties.unumpy.umatrix class. Please inform me of your progress if you do, so that we don't work in parallel on the same thing (even though this is unlikely).

Please note the following points:

uncertainties 3.0 brings massive speed improvements to some operations. Is this the version you used?
Because of the way calculations are done (lazily), your timing tests above, with this version, would underestimate the calculation time: calculating x**2 and calculating its uncertainty are two separate operations, in the current implementation. Timing tests should obtain the standard deviation (for instance with .std_dev or unumpy.std_devs()).

In addition to the issues you already cite, the discussion in issue #19 is relevant.

rth commented 7 years ago

Thank you for the feedback and additional suggestions, I'll look into it and let you know of the progress.

Edit: yes the benchmark above uses the latest 3.0 version

beojan commented 6 years ago

Where did this end up going (and what were the complications)? It seems a fairly straightforward proposal from afar.

rth commented 6 years ago

Where did this end up going (and what were the complications)? It seems a fairly straightforward proposal from afar.

If I remember correctly, the general idea was to subclass np.array by adapting the code and unit tests from masked matrices. What I have not accounted for was that the corresponding implementation in numpy was 6K lines of code with 3k lines of tests written over years, so even after some work on removing irrelevant parts, it still didn't go anywhere. Those changes can be found in the uarray branch of my fork, but I don't think there is anything currently useful there.

The documentation on subclassing numpy arrays is currently extensive enough bit IMO this is still non-trivial.

In any case, I later discovered that there was some prior work in this direction in https://github.com/astropy/astropy/pull/3715 The author also offered to merge some of the features back to uncertainties. That might be a promising approach.

The current status is that I don't really have the bandwidth to work on this anymore, unfortunately.

civilinformer commented 4 years ago

This seems like it could be a great project. I am trying to calculate the pseudo inverse with uncertainties for an array size (21931, 3). Using np.linalg.pinv (without uncertainties) this takes milliseconds. Using unumpy.ulinalg.pinv (with uncertainties) this does not complete within one half hour. The project as it stands is not very practical. Are there any plans to improve this?

lebigot commented 4 years ago

I guess you mean not practical for people who need the pseudo inverse of a matrix with many (60,000) coefficients? The uncertainties package is indeed not designed for such large matrices: when I wrote it, I had in mind simple expressions from physics, which typically have only a few variables. Concretely, the pseudo-inverse is a matrix of 60,000 coefficients, each represented by their dependence on another 60,000 coefficients: we are talking about 360 million parameters, which is sizeable.

Maybe the auto-differentiation tools from PyTorch or TensorFlow could help with your specific problem: this would be the first thing I would look at. If they solve the problem, then it would indeed be an interesting project to massively speed up uncertainties by using them (maybe in a completely different package). I cannot currently plan to look at this any soon, but that's an old idea that is worth checking.

PS: of course the other option would be to see what can be done simply with NumPy, as discussed above, but it's likely to not be as fast and why reinvent the wheel when we may not have to?

civilinformer commented 4 years ago

The problem I am trying to solve is AX = b. A is only 3 wide so x is a vector of length three. So there are only 3 unknowns. And like I said, this only take a few milliseconds using np.linalg.pinv (without the uncertainties). As the author of this issue noted, something is not being handled correctly.

But thanks for the suggestion, maybe if I think about error propagation I can solve this using one of the automated differentiation packages. I think jax does this too.

civilinformer commented 4 years ago

Just want to add that the solution I came up with is to assume a distributions for the errors of the data, add noise to the data following that distribution and then run the calculation (inversion and then dot product) a thousand times. Doing so the calculation takes just 4 seconds and out pops the full error distribution of the parameters.

beojan commented 4 years ago

In that case, you might want to look at mcerp which does Monte Carlo error propagation (though that hasn't been updated in a while either).

veeshy commented 4 years ago

There is also https://github.com/SALib/SALib which allows for more sophisticated UQ/SA, it handles sampling and outputs while you'd handle the model part.

And https://github.com/idaholab/raven if you want something more complex

lmfit / uncertainties

Scalability of the uncertainty propagation in numpy arrays #57

Statement of the problem

Proposed solution

Possible impact