hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.36k stars 466 forks source link

The Measurements Overhaul for 0.8 #350

Open hgrecco opened 8 years ago

hgrecco commented 8 years ago

This issue is to summarize the overhaul of Measurements targeted for Pint 0.8. The features/open questions are:

alexfeld commented 8 years ago

I figured out a better fix for the first problem listed if you are interested. The problem comes in the call to the __new__ constructor. I added the following lines to the _Quantity.__new__ method before anything else is done.

if isinstance(value, ufloat) and cls is not cls._REGISTRY.Measurement:
            cls = cls._REGISTRY.Measurement

The added if statement allows the constructor to use the proper cls type, which was the issue before. Otherwise, only the class of the object whose operator was called is passed to the constructor of the result, regardless of the type of value.

There was a constructor bug in the _Measurement class that I fixed as well

def __new__(cls, value, error, units=MISSING):
        if units is MISSING:
            # new if statement
            # missed option before
            if isinstance(value, ufloat):
                inst = super(_Measurement, cls).__new__(cls, value, error)
                return inst

           # the old method 
           try:
                value, units = value.magnitude, value.units
            except AttributeError:
                try:
                    value, error, units = value.nominal_value, value.std_dev, error
                except AttributeError:
                    units = ''
        try:
            error = error.to(units).magnitude
        except AttributeError:
            pass

        inst = super(_Measurement, cls).__new__(cls, ufloat(value, error), units)

        if error < 0:
            raise ValueError('The magnitude of the error cannot be negative'.format(value, error))
        return inst

My apologies for not making this suggestion via a pull request. I ended up using my own uncertain math package instead of uncertainties which required a few other changes to the pint package that aren't necessarily compatible with uncertainties.ufloat.

hgrecco commented 8 years ago

Thanks for the feedback. Can you elaborate about why are you dropping uncertainties in favor of another option? It might be useful to feedback into the Measurement design.

alexfeld commented 8 years ago

Mostly I need a package that can handle non-linear functions. For example, here are the results for a nonlinear function of two random variables that are distributed according to a multi-variate normal.

x_mean = [1.0, 2.0]
x_covariance = [[1.0, 0.1], [0.1, 1.0]]

def f(x):
    return 1/(x[0]**2 + x[1]**2 + 1) + x[0]x[1]**2

The below results compare 5 different methods for computing the function. The most important is the Monte Carlo plot in purple since this can be considered the "true" value of the function. The blue is a linear theory method that utilizes the gradient of the function with respect to the two RVs. The next is a second order version of the same method that utilizes the Hessian in addition to the gradient. These two methods use a "black box" evaluation, making use of an automatic differentiation package to compute the gradient and Hessian. The result using the uncertainties package are in red. As you can see they are linear, as noted in the uncertainties documentation. My implementation is in cyan, and is almost exactly the same as the Monte Carlo results. It is also a transparent computation class in the same way asQuantity.

uq_methods_comparison

Of course these results vary for different functions and distributions of the rvs, but the important point is that uncertainties in its current state cannot compute nonlinear functions to the degree of accuracy that I would like. Additionally, I didn't like that I needed to import uncertainties.umath for all of my math operations since I'm working with a large code base. I was able to write methods for the class such as sin so that numpy.sin calls them. Since I have to use numpy anyway, this made my imports and function writing a lot easier. Thus numpy is required, but you were probably going to be using it anyway if you were doing anything with random variables. My class is also slower, although still on the order of milliseconds at worst. You really can't tell over one function call.

So all in all I wanted more functionality and more accuracy than uncertainties could offer. Hope that helps.

hgrecco commented 8 years ago

Thanks for the detailed information. It does help a lot! I was wondering if you would like to contribute your code to Pint. Instead of using another package to optionally provide support for Measurements, we will be making Measurements a first class citizen of the package. What do you think?

alexfeld commented 8 years ago

I think that's a great idea. I need to get permission to release my code under an open source license first. It may take some time, but I don't anticipate it being an issue. The code does have one dependency, which is the autograd package GitHub Repo. It's also available via pip. It uses it's own wrapping of numpy just fyi. It hasn't presented any problems for me thus far.

hgrecco commented 8 years ago

Great! Let me know. I do not think depending on autograd is a problem. When you are ready, just organize your commits, push your changes and issue a PR.

CD3 commented 8 years ago

First, I apologize for the long post. I did not realize it would be this long when I started it...

I think it would be useful to separate the Measurement class from error propagation. A measurement is quantity with an uncertainty, and there are many useful features that the Measurement class can provide. Here are just a few that come to mind:

the upper bound (nominal value + uncertainty) the lower bound (nominal value - uncertainty) the measurement interval (upper - lower)

These are fairly simple calculations, but it would be nice if the measurement class did them. There is currently an issue with non-multiplicative units that could be resolved if the Measurement class did this, I can't do this for example:

T = units.Measurement( 20, 1, 'degC' )

Tmax = T.value + T.error

because I can't add two temperatures. These features would be more useful if you wanted to add support for non-symmetric errors.

If we assume that the measurement has Gaussian statistics (which is the only assumption that makes sense when the uncertainty is specified as a single quantity), then these features would be useful:

the 95% confidence interval for the measurement the X% confidence interval for the measurement are two measurements statistically different? automatically calculate a Measurement value and error from a list of Quantities

All of these things would be useful, and are separate from error propagation. And now for error propagation...

I would argue that error propagation is like function interpolation. Most people that need to do it have some very specific requirements and end up using their own code for anything beyond very basic calculations because no available library does everything exactly the way that they need. For example, I also use my own error propagation module (https://github.com/CD3/pyErrorProp) because I teach an undergraduate level physics lab and I needed to be able to reproduce the uncertainties that our students would calculate by hand, and we do not use the derivative method. There are many cases where the derivative method simply does not work. For example, what if I want to propagate error through a function that I don't have an analytic expression for?

As @alexfeld points out, the "correct" way to propagate error is to use a Monte-Carlo algorithm. But even then there are numerous options. The "real correct" way to propagate error would be to specify a complete probability distribution for all inputs to a calculation and compute the complete probability distribution for the result. But this is usually not possible because the input probability distributions are not available (and it may take way to long to calculate). So, we have to make some assumptions. If we assume that all of the inputs are Gaussian, then we can use a Monte Carlo method to generate the result's probability distribution. But here again, we will have to make an assumption because the result's probability distribution may not be Gaussian or symmetric. So it's not possible to characterize it with just a nominal value and error.

So, my point is this. Any error propagation method you choose will likely only satisfy a minority of users. pint is a fantastic units library that does true dimensional analysis and is very good at going to and from strings. Having a Measurement class that extends this to uncertain quantities would be very useful. For error propagation, I would like to see an error propagation framework that is easy to extend or replace and supports propagating error through user defined functions.

Anyhow, that is my two cents. Again, sorry for the long post.

hgrecco commented 8 years ago

@CD3 Do not apologize. On the contrary, thanks for taking the time to write this. It is very useful. I do agree that it will be nice to support what you suggested.

My problem with not providing any kind of error propagation for a Measurement class is that event most simple operations will fail. For example, you have two measurements and want to calculate their product, you do m1 * m2 and it will fail.

So the question would be ... how can we support the common case, make it clear that we this is only valid in certain cases, but still make it easy to extend?

CD3 commented 8 years ago

Yes, I agree. There should be some simple default algorithm so that things like addition, subtraction, multiplication, and division work.

The simple method that I teach in physics laboratory courses is that the uncertainty in a calculation f that depends on a measurement x is just df = f(x + dx) - f(x). This will give the same result as the derivative based method for linear functions, but does a better job with non-linear functions. It is also much simpler because it does not require evaluating any derivatives.

I only have experience with C++ operator overloading, so I don't know the details of how Python handles it, but from what I have read it seems like the Measurement class is responsible for providing some magic methods that Python will call when it runs into operators that involve a Measurement instance. So, perhaps the magic method could be implemented to use a delegate class that implements the actual error propagation method and then provide a default implementation that could be partially or completely override by the user. The default implementation would serve as the reference for the error propagation interface.

I also don't know much about the details of numpy, but from what I can gather from reading your posts on the uncertainties issue tracker is that it does something similar. The numpy sin function will try to call a .sin method on the argument, so that user-defined types can add support for numpy functions. If this is the case, then it seems like the simple error propagation method I mentioned above would be (relatively) easy to implement and would not depend on the derivative of the function being known or stored.

CD3 commented 8 years ago

Another thought I've had in the past is that it would be nice if the Measurement class supported the decimal module. It is possible to use decimal in the Quantity class, but everything gets converted to ufloat in the Measurement class.

My particular need is for rounding. There are rules for how to round a quantity with uncertainty, and I want to be able to reproduce the same rounded result as a hand calculation, but using a float causes some differences.

CD3 commented 8 years ago

It looks like the latest version of numpy (1.11) adds a new magic method named __numpy_ufunc__ (http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html). If an instance has this method, it will be called and its return value will be returned to the user. This should make it possible for the Measurement class to build on top of the Quantity classes numpy support. Currently, __array_prepare__ has to return an ndarray (or a subclass), so you can't just use the Quantity classes implementation. With the new __numpy_ufunc__ method, that should be possible.

mikofski commented 8 years ago

If I understand this thread correctly, the idea is to make Measurements or at least the error propagation part of it something that can be specified by the user. If so, I agree, this would be a valuable approach. Somewhere the user could specify the error propagation backend that they wanted to use. Different error-propagation tools could provide there own backends. The backend API could specify the standard calls that the __numpy_ufunc__ or other method would call to propagate the error. Something like this:

Measurement.backends.error_propagation.base.propagate_error(X, F, COV, **KW) -> DF
"""
base class for Measurement class error propagation backend. Implements the default method.
:param X: a sequence of dependent variables
:param F: function that takes X
:param COV: a covariance matrix for X
:param KW: backend specific keyword arguments
:return: DF a covariance matrix for the output of F(X)
"""

As it is I also fall into the camp of writing our own package, UncertaintyWrapper because we needed a much faster implementation than the autodifferentiation or Monte Carlo approaches, we also needed something that did not overload or force us to rewrite our code using specific implementations of ufuncs and most importantly we needed something that could wrap extensions in C/C++. Currently our method works with the Pint wrapper, but would be improved by PR #364.

Also, I think that use of the new explicit overloading method in Numpy-1.11 __numpy_ufunc__ is the correct direction to go for overloading ufuncs. There may be issues with the approach that @alexfeld describes:

write methods for the class such as sin so that numpy.sin calls them

... which is according to these threads (numpy-7591, uncertainties-47) ...

an ad hoc hack inside several survivor ufuncs.

Also, this behavior is kinda broken and causes problems, so I hope we are able to deprecate it soon. I would recommend not trying to build new things on top of it.

A fast better solution is the __numpy_ufunc__ approach

alexfeld commented 7 years ago

@mikofski I did not know about the better numpy overloading method. Thank you for pointing that out!

My apologies for disappearing for a while, my time has been diverted to other projects. I have stopped work on this at the moment, but I will commit what I had before so you can see what I was doing. I do not know if it will be useful because it is basically just a different version of uncertainties just with my version of the same approximation that they are using.

rkingsbury commented 4 years ago

I'll add to this that Measurement objects do not seem to work with wrapped functions. If I have

import numpy as np
from pint import UnitRegistry
ureg = UnitRegistry()
Q = ureg.Quantity('1 meter')

def f(x):
    return 1/x

wrapped_f = ureg.wraps("m", "m")(f)

wrapped_f(Q) returns a value in meters, as expected, but if I add uncertainty:

Q = Q.plus_minus(0.1)

wrapped_f(Q) doesn't seem to recognize the units on Q anymore. It raises a ValueError:

ValueError: A wrapped function using strict=True requires quantity or a string for all arguments with not None units. (error found for m, (1.00 +/- 0.10) meter)