Closed timotheecour closed 3 years ago
I like almostEqual
better but apart from that, good RFC.
I support this inclusion. In ggplotnim
I have my own almostEqual
here:
https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/value.nim#L324-L339
Would be very happy if I wouldn't need to be the person to decide how smart that implementation is, haha.
Previously isClose
was also suggested. https://github.com/nim-lang/Nim/pull/9555
For your consideration: "near"? Maybe looks nicer in if a.near(b, tol): ...
style. "nearby"? "within" is another possibility though the parameter order might more naturally be while x.within(tol, y): ...
. Personally, I like "near" the most. I'm also not entirely sure it would suck to a default value like "1 ulps". if a.near(b)
looks nice, and I think "near" immediately suggests "how near" and the existence of the present parameter. Maybe it would suck and we want to force the choice. Worth mentioning, though.
I would suggest the closeness parameter be named ulp: int
like the C++ to be specific { also, to be like the C++.. :-) } and to leave open an eps: float
or tol: float
parameter later. Even if for "FP reasons" the impl is not abs(x-y) <= max|sum(abs(x),abs(y)+eps)*tol
style test, users of FP often think more in terms of 1e-4 or 1e-6 or 1e-8 style matching than ULPs. It's not a fluke/accident of his impl that @Vindaar has that interface. People always say/think in terms of how many high order decimals match, not how many lower order decimals (or maybe bits!) mismatch. So, I would expect this float param overload to be more popular over time. (But it's A.Ok to do the ulp way first..).
Tagging @mratsim as he usually seems to have opinions in this realm.
We should congratulate C++ on getting a name right for the first time and use almostEqual
too.
Personally I need both Mean Absolute Error and Mean Relative Error https://github.com/mratsim/Arraymancer/blob/1a2422a/src/arraymancer/ml/metrics/common_error_functions.nim#L35-L66
proc relative_error*[T: SomeFloat](y, y_true: T): T {.inline.} =
## Relative error, |y_true - y|/max(|y_true|, |y|)
## Normally the relative error is defined as |y_true - y| / |y_true|,
## but here max is used to make it symmetric and to prevent dividing by zero,
## guaranteed to return zero in the case when both values are zero.
# We require float (and not complex as complex.abs will cause issue)
let denom = max(abs(y_true), abs(y))
if denom == 0.T:
return 0.T
result = abs(y_true - y) / denom
proc absolute_error*[T: SomeFloat](y, y_true: T): T {.inline.} =
## Absolute error for a single value, |y_true - y|
# We require float (and not complex as complex.abs will cause issue)
result = abs(y_true - y)
Relevant:
@Araq...Lol - fair enough. I mostly thought "near" or "within" closer to ordinary terse single word language which is how Nim seems to steer. near
would just make people less likely to want =~
which as an operator has no easy way to grow a defaulted parameter. Before Google search ate its business, AltaVista Search used "near" as a query operator as in 'find me web pages with words "Andreas NEAR Rumpf"', for example. :-) C++ similarity is a fine argument, too, though, esp. for the ULP "from the least significant side" parameterization.
@mratsim's separation of the notion of "error" or "distance" from the actual <=
comparison test also partly explains on aspect of "why" of people think in terms of matching digits not mismatching digits. Numerical analysts/people matching worrying about getting nearly round-off error-free results to all 16 IEEE double places worry about ULPs. Many more using floats consider them to have "over-provisioned accuracy" and are instead concerned with "convergence" or "practical equality" of something, graphics, etc. The "direct distance" parameterization is then more natural than this ULP "complementary distance". (But "both parameterizations" is better than "one or the other" here. Nim has nice type-based overloading. almostEqual
also works for both ways.)
Yeah. Not entirely sure this is @planetis-m's point, but Py isClose(a, b, rel_tol=1e-09, abs_tol=0.0)
feels more natural (both naming & semantics) to me than almostEquals(a, b, ulps)
from C++, but as mentioned both have roles.
So since we settled on the name, which algorithm would be better and why?
There now is a math.almostEqual
.
approximate float equality is tricky to implement properly, I propose we add
approxEqual
std std/math, using following algorithm:https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
It's the most sensible choice, taking into account FP semantics, and in particular is better than something like
as often found in the wild (eg https://github.com/nim-lang/Nim/pull/15670) since it works irrespective of scaling. It also takes into account subnormal numbers unlike simpler forms like:
note
~=
or=~
with a (subjective) value fortol
/ulp
.