Open hgrecco opened 8 years ago
I figured out a better fix for the first problem listed if you are interested. The problem comes in the call to the __new__
constructor. I added the following lines to the _Quantity.__new__
method before anything else is done.
if isinstance(value, ufloat) and cls is not cls._REGISTRY.Measurement:
cls = cls._REGISTRY.Measurement
The added if statement allows the constructor to use the proper cls
type, which was the issue before. Otherwise, only the class of the object whose operator was called is passed to the constructor of the result, regardless of the type of value
.
There was a constructor bug in the _Measurement
class that I fixed as well
def __new__(cls, value, error, units=MISSING):
if units is MISSING:
# new if statement
# missed option before
if isinstance(value, ufloat):
inst = super(_Measurement, cls).__new__(cls, value, error)
return inst
# the old method
try:
value, units = value.magnitude, value.units
except AttributeError:
try:
value, error, units = value.nominal_value, value.std_dev, error
except AttributeError:
units = ''
try:
error = error.to(units).magnitude
except AttributeError:
pass
inst = super(_Measurement, cls).__new__(cls, ufloat(value, error), units)
if error < 0:
raise ValueError('The magnitude of the error cannot be negative'.format(value, error))
return inst
My apologies for not making this suggestion via a pull request. I ended up using my own uncertain math package instead of uncertainties
which required a few other changes to the pint
package that aren't necessarily compatible with uncertainties.ufloat
.
Thanks for the feedback. Can you elaborate about why are you dropping uncertainties in favor of another option? It might be useful to feedback into the Measurement design.
Mostly I need a package that can handle non-linear functions. For example, here are the results for a nonlinear function of two random variables that are distributed according to a multi-variate normal.
x_mean = [1.0, 2.0]
x_covariance = [[1.0, 0.1], [0.1, 1.0]]
def f(x):
return 1/(x[0]**2 + x[1]**2 + 1) + x[0]x[1]**2
The below results compare 5 different methods for computing the function. The most important is the Monte Carlo plot in purple since this can be considered the "true" value of the function.
The blue is a linear theory method that utilizes the gradient of the function with respect to the two RVs.
The next is a second order version of the same method that utilizes the Hessian in addition to the gradient. These two methods use a "black box" evaluation, making use of an automatic differentiation package to compute the gradient and Hessian.
The result using the uncertainties
package are in red. As you can see they are linear, as noted in the uncertainties
documentation.
My implementation is in cyan, and is almost exactly the same as the Monte Carlo results. It is also a transparent computation class in the same way asQuantity
.
Of course these results vary for different functions and distributions of the rvs, but the important point is that uncertainties
in its current state cannot compute nonlinear functions to the degree of accuracy that I would like. Additionally, I didn't like that I needed to import uncertainties.umath
for all of my math operations since I'm working with a large code base. I was able to write methods for the class such as sin
so that numpy.sin
calls them. Since I have to use numpy
anyway, this made my imports and function writing a lot easier. Thus numpy
is required, but you were probably going to be using it anyway if you were doing anything with random variables. My class is also slower, although still on the order of milliseconds at worst. You really can't tell over one function call.
So all in all I wanted more functionality and more accuracy than uncertainties
could offer. Hope that helps.
Thanks for the detailed information. It does help a lot! I was wondering if you would like to contribute your code to Pint. Instead of using another package to optionally provide support for Measurements, we will be making Measurements a first class citizen of the package. What do you think?
I think that's a great idea. I need to get permission to release my code under an open source license first. It may take some time, but I don't anticipate it being an issue. The code does have one dependency, which is the autograd
package GitHub Repo. It's also available via pip
. It uses it's own wrapping of numpy
just fyi. It hasn't presented any problems for me thus far.
Great! Let me know. I do not think depending on autograd is a problem. When you are ready, just organize your commits, push your changes and issue a PR.
First, I apologize for the long post. I did not realize it would be this long when I started it...
I think it would be useful to separate the Measurement class from error propagation. A measurement is quantity with an uncertainty, and there are many useful features that the Measurement class can provide. Here are just a few that come to mind:
the upper bound (nominal value + uncertainty) the lower bound (nominal value - uncertainty) the measurement interval (upper - lower)
These are fairly simple calculations, but it would be nice if the measurement class did them. There is currently an issue with non-multiplicative units that could be resolved if the Measurement class did this, I can't do this for example:
T = units.Measurement( 20, 1, 'degC' )
Tmax = T.value + T.error
because I can't add two temperatures. These features would be more useful if you wanted to add support for non-symmetric errors.
If we assume that the measurement has Gaussian statistics (which is the only assumption that makes sense when the uncertainty is specified as a single quantity), then these features would be useful:
the 95% confidence interval for the measurement the X% confidence interval for the measurement are two measurements statistically different? automatically calculate a Measurement value and error from a list of Quantities
All of these things would be useful, and are separate from error propagation. And now for error propagation...
I would argue that error propagation is like function interpolation. Most people that need to do it have some very specific requirements and end up using their own code for anything beyond very basic calculations because no available library does everything exactly the way that they need. For example, I also use my own error propagation module (https://github.com/CD3/pyErrorProp) because I teach an undergraduate level physics lab and I needed to be able to reproduce the uncertainties that our students would calculate by hand, and we do not use the derivative method. There are many cases where the derivative method simply does not work. For example, what if I want to propagate error through a function that I don't have an analytic expression for?
As @alexfeld points out, the "correct" way to propagate error is to use a Monte-Carlo algorithm. But even then there are numerous options. The "real correct" way to propagate error would be to specify a complete probability distribution for all inputs to a calculation and compute the complete probability distribution for the result. But this is usually not possible because the input probability distributions are not available (and it may take way to long to calculate). So, we have to make some assumptions. If we assume that all of the inputs are Gaussian, then we can use a Monte Carlo method to generate the result's probability distribution. But here again, we will have to make an assumption because the result's probability distribution may not be Gaussian or symmetric. So it's not possible to characterize it with just a nominal value and error.
So, my point is this. Any error propagation method you choose will likely only satisfy a minority of users. pint
is a fantastic units library that does true dimensional analysis and is very good at going to and from strings. Having a Measurement class that extends this to uncertain quantities would be very useful. For error propagation, I would like to see an error propagation framework that is easy to extend or replace and supports propagating error through user defined functions.
Anyhow, that is my two cents. Again, sorry for the long post.
@CD3 Do not apologize. On the contrary, thanks for taking the time to write this. It is very useful. I do agree that it will be nice to support what you suggested.
My problem with not providing any kind of error propagation for a Measurement class is that event most simple operations will fail. For example, you have two measurements and want to calculate their product, you do m1 * m2
and it will fail.
So the question would be ... how can we support the common case, make it clear that we this is only valid in certain cases, but still make it easy to extend?
Yes, I agree. There should be some simple default algorithm so that things like addition, subtraction, multiplication, and division work.
The simple method that I teach in physics laboratory courses is that the uncertainty in a calculation f
that depends on a measurement x
is just df = f(x + dx) - f(x)
. This will give the same result as the derivative based method for linear functions, but does a better job with non-linear functions. It is also much simpler because it does not require evaluating any derivatives.
I only have experience with C++ operator overloading, so I don't know the details of how Python handles it, but from what I have read it seems like the Measurement
class is responsible for providing some magic methods that Python will call when it runs into operators that involve a Measurement
instance. So, perhaps the magic method could be implemented to use a delegate class that implements the actual error propagation method and then provide a default implementation that could be partially or completely override by the user. The default implementation would serve as the reference for the error propagation interface.
I also don't know much about the details of numpy
, but from what I can gather from reading your posts on the uncertainties
issue tracker is that it does something similar. The numpy sin
function will try to call a .sin
method on the argument, so that user-defined types can add support for numpy
functions. If this is the case, then it seems like the simple error propagation method I mentioned above would be (relatively) easy to implement and would not depend on the derivative of the function being known or stored.
Another thought I've had in the past is that it would be nice if the Measurement
class supported the decimal
module. It is possible to use decimal
in the Quantity
class, but everything gets converted to ufloat
in the Measurement
class.
My particular need is for rounding. There are rules for how to round a quantity with uncertainty, and I want to be able to reproduce the same rounded result as a hand calculation, but using a float
causes some differences.
It looks like the latest version of numpy (1.11) adds a new magic method named __numpy_ufunc__
(http://docs.scipy.org/doc/numpy-dev/reference/arrays.classes.html). If an instance has this method, it will be called and its return value will be returned to the user. This should make it possible for the Measurement
class to build on top of the Quantity
classes numpy
support. Currently, __array_prepare__
has to return an ndarray
(or a subclass), so you can't just use the Quantity
classes implementation. With the new __numpy_ufunc__
method, that should be possible.
If I understand this thread correctly, the idea is to make Measurements
or at least the error propagation part of it something that can be specified by the user. If so, I agree, this would be a valuable approach. Somewhere the user could specify the error propagation backend that they wanted to use. Different error-propagation tools could provide there own backends. The backend API could specify the standard calls that the __numpy_ufunc__
or other method would call to propagate the error. Something like this:
Measurement.backends.error_propagation.base.propagate_error(X, F, COV, **KW) -> DF
"""
base class for Measurement class error propagation backend. Implements the default method.
:param X: a sequence of dependent variables
:param F: function that takes X
:param COV: a covariance matrix for X
:param KW: backend specific keyword arguments
:return: DF a covariance matrix for the output of F(X)
"""
As it is I also fall into the camp of writing our own package, UncertaintyWrapper because we needed a much faster implementation than the autodifferentiation or Monte Carlo approaches, we also needed something that did not overload or force us to rewrite our code using specific implementations of ufuncs and most importantly we needed something that could wrap extensions in C/C++. Currently our method works with the Pint wrapper, but would be improved by PR #364.
Also, I think that use of the new explicit overloading method in Numpy-1.11 __numpy_ufunc__
is the correct direction to go for overloading ufuncs. There may be issues with the approach that @alexfeld describes:
write methods for the class such as
sin
so thatnumpy.sin
calls them
... which is according to these threads (numpy-7591, uncertainties-47) ...
an ad hoc hack inside several survivor ufuncs.
Also, this behavior is kinda broken and causes problems, so I hope we are able to deprecate it soon. I would recommend not trying to build new things on top of it.
A fast better solution is the
__numpy_ufunc__
approach
@mikofski I did not know about the better numpy overloading method. Thank you for pointing that out!
My apologies for disappearing for a while, my time has been diverted to other projects. I have stopped work on this at the moment, but I will commit what I had before so you can see what I was doing. I do not know if it will be useful because it is basically just a different version of uncertainties
just with my version of the same approximation that they are using.
I'll add to this that Measurement
objects do not seem to work with wrapped functions. If I have
import numpy as np
from pint import UnitRegistry
ureg = UnitRegistry()
Q = ureg.Quantity('1 meter')
def f(x):
return 1/x
wrapped_f = ureg.wraps("m", "m")(f)
wrapped_f(Q) returns a value in meters, as expected, but if I add uncertainty:
Q = Q.plus_minus(0.1)
wrapped_f(Q) doesn't seem to recognize the units on Q anymore. It raises a ValueError:
ValueError: A wrapped function using strict=True requires quantity or a string for all arguments with not None units. (error found for m, (1.00 +/- 0.10) meter)
This issue is to summarize the overhaul of Measurements targeted for Pint 0.8. The features/open questions are: