nim-lang / RFCs

A repository for your Nim proposals.
136 stars 26 forks source link

add `approxEqual` to std/math #275

Closed timotheecour closed 3 years ago

timotheecour commented 3 years ago

approximate float equality is tricky to implement properly, I propose we add approxEqual std std/math, using following algorithm:

https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon

template<class T>
typename std::enable_if<!std::numeric_limits<T>::is_integer, bool>::type
    almost_equal(T x, T y, int ulp)
{
    // the machine epsilon has to be scaled to the magnitude of the values used
    // and multiplied by the desired precision in ULPs (units in the last place)
    return std::fabs(x-y) <= std::numeric_limits<T>::epsilon() * std::fabs(x+y) * ulp
        // unless the result is subnormal
        || std::fabs(x-y) < std::numeric_limits<T>::min();
}

It's the most sensible choice, taking into account FP semantics, and in particular is better than something like

proc `=~`(x, y: float): bool = abs(x-y) < tol

as often found in the wild (eg https://github.com/nim-lang/Nim/pull/15670) since it works irrespective of scaling. It also takes into account subnormal numbers unlike simpler forms like:

proc `=~`(x, y, tol: float): bool =
  abs(x-y) / (fabs(x) + fabs(y) + epsilon) < tol

note

Araq commented 3 years ago

I like almostEqual better but apart from that, good RFC.

Vindaar commented 3 years ago

I support this inclusion. In ggplotnim I have my own almostEqual here:

https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/dataframe/value.nim#L324-L339

Would be very happy if I wouldn't need to be the person to decide how smart that implementation is, haha.

planetis-m commented 3 years ago

Previously isClose was also suggested. https://github.com/nim-lang/Nim/pull/9555

c-blake commented 3 years ago

For your consideration: "near"? Maybe looks nicer in if a.near(b, tol): ... style. "nearby"? "within" is another possibility though the parameter order might more naturally be while x.within(tol, y): .... Personally, I like "near" the most. I'm also not entirely sure it would suck to a default value like "1 ulps". if a.near(b) looks nice, and I think "near" immediately suggests "how near" and the existence of the present parameter. Maybe it would suck and we want to force the choice. Worth mentioning, though.

I would suggest the closeness parameter be named ulp: int like the C++ to be specific { also, to be like the C++.. :-) } and to leave open an eps: float or tol: float parameter later. Even if for "FP reasons" the impl is not abs(x-y) <= max|sum(abs(x),abs(y)+eps)*tol style test, users of FP often think more in terms of 1e-4 or 1e-6 or 1e-8 style matching than ULPs. It's not a fluke/accident of his impl that @Vindaar has that interface. People always say/think in terms of how many high order decimals match, not how many lower order decimals (or maybe bits!) mismatch. So, I would expect this float param overload to be more popular over time. (But it's A.Ok to do the ulp way first..).

c-blake commented 3 years ago

Tagging @mratsim as he usually seems to have opinions in this realm.

Araq commented 3 years ago

We should congratulate C++ on getting a name right for the first time and use almostEqual too.

mratsim commented 3 years ago

Personally I need both Mean Absolute Error and Mean Relative Error https://github.com/mratsim/Arraymancer/blob/1a2422a/src/arraymancer/ml/metrics/common_error_functions.nim#L35-L66

proc relative_error*[T: SomeFloat](y, y_true: T): T {.inline.} =
  ## Relative error, |y_true - y|/max(|y_true|, |y|)
  ## Normally the relative error is defined as |y_true - y| / |y_true|,
  ## but here max is used to make it symmetric and to prevent dividing by zero,
  ## guaranteed to return zero in the case when both values are zero.
  # We require float (and not complex as complex.abs will cause issue)
  let denom = max(abs(y_true), abs(y))
  if denom == 0.T:
    return 0.T
  result = abs(y_true - y) / denom
proc absolute_error*[T: SomeFloat](y, y_true: T): T {.inline.} =
  ## Absolute error for a single value, |y_true - y|
  # We require float (and not complex as complex.abs will cause issue)
  result = abs(y_true - y)

Relevant:

c-blake commented 3 years ago

@Araq...Lol - fair enough. I mostly thought "near" or "within" closer to ordinary terse single word language which is how Nim seems to steer. near would just make people less likely to want =~ which as an operator has no easy way to grow a defaulted parameter. Before Google search ate its business, AltaVista Search used "near" as a query operator as in 'find me web pages with words "Andreas NEAR Rumpf"', for example. :-) C++ similarity is a fine argument, too, though, esp. for the ULP "from the least significant side" parameterization.

@mratsim's separation of the notion of "error" or "distance" from the actual <= comparison test also partly explains on aspect of "why" of people think in terms of matching digits not mismatching digits. Numerical analysts/people matching worrying about getting nearly round-off error-free results to all 16 IEEE double places worry about ULPs. Many more using floats consider them to have "over-provisioned accuracy" and are instead concerned with "convergence" or "practical equality" of something, graphics, etc. The "direct distance" parameterization is then more natural than this ULP "complementary distance". (But "both parameterizations" is better than "one or the other" here. Nim has nice type-based overloading. almostEqual also works for both ways.)

planetis-m commented 3 years ago

Found a couple links: https://www.python.org/dev/peps/pep-0485 https://bugs.python.org/issue39310 https://github.com/google/googletest/blob/master/googletest/include/gtest/internal/gtest-internal.h#L290

c-blake commented 3 years ago

Yeah. Not entirely sure this is @planetis-m's point, but Py isClose(a, b, rel_tol=1e-09, abs_tol=0.0) feels more natural (both naming & semantics) to me than almostEquals(a, b, ulps) from C++, but as mentioned both have roles.

planetis-m commented 3 years ago

So since we settled on the name, which algorithm would be better and why?

Araq commented 3 years ago

There now is a math.almostEqual.