Closed yumengch closed 3 years ago
Bug, I think 🤔. But where to fix it? As you say, both NaN and 300 kind of make sense
The mean value and confidence interval is rounded already.
The problematic part tries to ensure equality between the number of decimal of the mean value and the confidence interval.
NaN
does not have 'well-defined' number of decimal, which we can avoid this completely.
For 0
confidence interval, it is not necessary to show small trailing values.
I think we can try to circumvent the problem by avoiding extreme cases into running the formatting.
In the method def round(self, mult=1.0):
, we use following code:
# Extreme cases
if np.isnan(self.conf) or self.conf == 0:
c = self.conf
v = round2sigfig(self.val, rc.sigfig)
In the method def __str__(self):
, we add one line of code:
def __str__(self):
# Round nicely
v, c = self.round()
# Ensure we get 1.30 ±0.01, NOT 1.3 ±0.01.
n = log10int(c)
if abs(n) == 300:
# prevent long format for extreme cases
return "{} ±{}".format(v, c)
frmt = "%.f" if n >= 0 else "%%0.%df" % -n
v = frmt % v
return "{} ±{}".format(v, c)
This should lead to a reasonable representations, some examples:
>>> import numpy as np
>>> from dapper.tools.series import mean_with_conf, UncertainQtty
>>> UncertainQtty(val = 1.3, conf = 0.01)
UncertainQtty(val=1.30, conf=0.01)
>>> xx = np.random.random(4)
>>> mean_with_conf(xx)
UncertainQtty(val=0.2421, conf=nan)
>>> xx = np.random.random(100)*1e-6 + 2.
>>> mean_with_conf(xx)
UncertainQtty(val=2.0, conf=0)
>>> xx = np.random.random(100) + 3.
>>> mean_with_conf(xx)
UncertainQtty(val=3.51, conf=0.03)
Sorry I'm taking long to reply. And thank you for the suggestions. I haven't fully been able to study your code suggestions yet. After briefly reviewing it, some questions:
self.conf == 0
need to be a special case?abs(n) == 300
use rc.sigfig
?Also, do you know if it's possible to use diffs in filing issues like these? You know, the kind that appears when reviewing pull requests. Just because I find it a little hard to read the suggested changes without diffs.
BTW, I'm in the process of implementing docstring tests (example). There are already a few tests in place around the code, but I have yet to include them explicitly in the test suite. Anyways, henceforth we should try to include them if possible.
self.conf == 0
should be a special case. I didn't check that when I raise the issue but when self.conf == 0
, log10int(c)
returns -300
and dapper.tools.math.round2
is also dependent on log10int
UncertainQtty
is mainly for the pretty print.
log10int
uses the numpy convention and treat np.nan/np.inf/0
as if it is 1e+300 or 1e-300.
However, np.nan/np.inf/0
can be printed in the screen/files directly and I believe it is not a good idea to show 300 decimals and the default value, rc.sigfig
, can be used.self.conf
as the condition. Also, do you know if it's possible to use diffs in filing issues like these? You know, the kind that appears when reviewing pull requests. Just because I find it a little hard to read the suggested changes without diffs.
Sorry that I didn't realise this issue. I am not aware of any way of doing exactly what you want at the moment (probably I should do more googling). One alternative might be pulling from my changed code. But I can provide my diff history here:
diff --git a/dapper/tools/series.py b/dapper/tools/series.py
index 681bdba..6c52c8b 100644
--- a/dapper/tools/series.py
+++ b/dapper/tools/series.py
@@ -111,10 +111,10 @@ class UncertainQtty():
- fallback: rc.sigfig
"""
# Extreme cases
- if np.isnan(self.conf):
+ if np.isnan(self.conf) or self.conf == 0:
c = self.conf
v = round2sigfig(self.val, rc.sigfig)
else:
c = round2sigfig(self.conf, 1)
v = round2(self.val, mult*self.conf)
@@ -125,6 +125,7 @@ class UncertainQtty():
v, c = self.round()
# Ensure we get 1.30 ±0.01, NOT 1.3 ±0.01.
n = log10int(c)
+ if abs(n) == 300: return "{} ±{}".format(v, c)
frmt = "%.f" if n >= 0 else "%%0.%df" % -n
v = frmt % v
return "{} ±{}".format(v, c)
BTW, I'm in the process of implementing docstring tests (example). There are already a few tests in place around the code, but I have yet to include them explicitly in the test suite. Anyways, henceforth we should try to include them if possible.
This is good. I believe it might be a great way for unit testing in the future.
Sorry again for the delay! Thanks for the clarificiations. The diff looks great btw :-)
I agree conf==0
is a special case. However, to my mind it is already well handled in .round
:
>>> UncertainQtty(1.2, 0).round()
(1.2, 0)
Now .round
might have other purposes outside of __str__
, so I think changes should be made only to __str__
.
Now, as you say np.nan/np.inf/0
should be printed directly. This is also another reason not to rely on 300 as a magic number. So, indeed, should instead check for those special cases themselves.
The case conf==0
should actually not fallback to rc.sigfig
because that is intended for NaNs, whereas conf==0
is its own special case, which merits more (but not 300!) decimals.
I've made a set of examples/doctests that __str__
should satisfy. You can run them in the terminal using
pytest --doctest-modules dapper/tools/series.py
. Here they are:
@dataclass
class UncertainQtty():
"""Data container associating uncertainty (confidence) to a quantity.
Includes intelligent rounding and printing functionality.
Examples:
>>> for c in [.01, .1, .2, .9, 1]:
... print(UncertainQtty(1.2345, c))
1.23 ±0.01
1.2 ±0.1
1.2 ±0.2
1.2 ±0.9
1 ±1
>>> for c in [.01, 1e-10, 1e-17, 0]:
... print(UncertainQtty(1.2, c))
1.20 ±0.01
1.2000000000 ±1e-10
1.19999999999999996 ±1e-17
1.2000000000 ±0
Note that in the case of a confidence of exactly 0,
it defaults to 10 decimal places.
Meanwhile, a NaN confidence yields printing using `rc.sigfig`:
>>> print(UncertainQtty(1.234567, np.nan))
1.235 ±nan
Also note the effect of large uncertainty:
>>> for c in [1, 9, 10, 11, 20, 100, np.inf]:
... print(UncertainQtty(12, c))
12 ±1
12 ±9
10 ±10
10 ±10
10 ±20
0 ±100
0 ±inf
"""
val: float
conf: float
def round(self, mult=1.0): # noqa
"""Round intelligently.
- `conf` to 1 sig. fig.
- `val`:
- to precision: `mult*conf`
- fallback: `rc.sigfig`
"""
if np.isnan(self.conf):
# Fallback to rc.sigfig
c = self.conf
v = round2sigfig(self.val, rc.sigfig)
else:
# Normal/general case
c = round2sigfig(self.conf, 1)
v = round2(self.val, mult*self.conf)
return v, c
def __str__(self):
"""Returns 'val ±conf'.
The value string contains the number of decimals required by the confidence.
Note: the special cases require processing on top of `round`.
"""
v, c = self.round()
if np.isnan(c):
# Rounding to fallback (rc.sigfig) already took place
return f"{v} ±{c}"
if c == 0:
# 0 (i.e. not 1e-300) never arises "naturally" => Treat it "magically"
# by truncating to a default. Also see https://stackoverflow.com/a/25899600
n = -10
else:
# Normal/general case.
n = log10int(c)
# Ensure we get 1.30 ±0.01, NOT 1.3 ±0.01:
frmt = "%.f"
if n < 0:
frmt = "%%0.%df" % -n
v = frmt % v
return f"{v} ±{c}"
def __repr__(self):
"""Essentially the same as `__str__`."""
v, c = str(self).split(" ±")
return self.__class__.__name__ + f"(val={v}, conf={c})"
I'm not 100% satisfied with the example producing 1.19999999999999996 ±1e-17
but I think that's just something one has to live with in the case of decimal representation of floats (fixing it would probably require much hacking).
Finally, I think it might be reasonable to move this class to dapper/tools/rounding.py
because it is now mostly about rounding. What do you think?
Probably, also, conf
should be renamed prec
because one might think that bigger conf
means bigger "confidence", whereas the code treats it as "confidence interval" (i.e. smaller is better).
Thanks Patrick.
I made a few simple tests and I think it is reasonable. The compromise for 10 digits is also acceptable in my opinion.
Finally, I think it might be reasonable to move this class to dapper/tools/rounding.py because it is now mostly about rounding. What do you think?
I think we can move all rounding-related functions/classes into a single file.
Probably, also, conf should be renamed prec because one might think that bigger conf means bigger "confidence", whereas the code treats it as "confidence interval" (i.e. smaller is better).
Great idea to reduce confusion.
Sorry for my recent silence.
function
mean_with_conf
seems to be in the quite high rank of the TODO list. One part of the code isThis means for short time series, the confidence interval is
NaN
, which is understandable.However, a quick test:
The reason is that, in
__str__
of classUncertainQtty
,n = log10int(c)
is-300
sincec
isNaN
. This leads to a format of%0.300f
for the mean value. I wonder whether it is designed by purpose or a bug?Thank you.