snitch-org / snitch

Lightweight C++20 testing framework.
Boost Software License 1.0
257 stars 7 forks source link

Add float matchers #101

Open cschreib opened 1 year ago

cschreib commented 1 year ago

Catch2 offers some builtin float matchers to compare two floating point numbers for approximated equality. This seems to be a popular feature:

It could be easy to include without bloating snitch.

As stated in the discussions linked above, we first need to understand what people are generally trying to achieve. There are a lot of misconceptions around floating point numbers, and some traps that are easy to fall into.. Something too simple might not fit anybody's needs. We could also quickly over-engineer this in order to make it generally useful to everyone, and we should try to avoid that.

Catch2 used to offer the Approx matcher, but this was eventually deprecated in favor of the WithinAbs/WithinRel/WithinULP matchers. If we are not going to just port over the new Catch2 matchers, we should learn from this. The weaknesses of Approx were:

The new matchers presumably fix all this, however they rely on combining matchers to work nicely, so we would have to implement this as well:

REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6));

I think this is a nice idea, it allows specifying pretty much all combinations of tolerances one may need. However it is a bit verbose, we can probably improve on that.

doctest uses Approx, I believe it is the same as the one in Catch2.

GoogleTest uses EXPECT_NEAR, which is just an absolute tolerance check. They have various matchers although they appear to only support absolute tolerances.

cschreib commented 1 year ago

Perhaps something like:

template<typename T>
struct within {
    T expected = {};
    struct tolerance_thresholds {
        T abs = {};
        T rel = {};
    } tolerance;

    explicit within(T e, T abs_tol) : expected(e), tolerance{.abs=abs_tol} {}
    explicit within(T e, const tolerance_thresholds& tol) : expected{e}, tolerance{tol} {}

    constexpr bool match(T value) const noexcept {
        const T tol = std::max(tolerance.abs, tolerance.rel * std::max(std::abs(value), std::abs(expected)));
        return std::abs(value - expected) <= tol;
    }
};
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_NEAR(value, expected, 1e-6); // GoogleTest
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, {.rel=1e-3}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
// Until we implement combining matchers:
REQUIRE_THAT(value, within{expected, {.rel=1e-3}}); // snitch
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch
// When we implement combining matchers:
REQUIRE_THAT(value, within{expected, {.rel=1e-3}} && within{expected, {.abs=1e-6}}); // snitch
tocic commented 1 year ago

Here are two concerns about the proposed design:

1) It's important to understand how often the user needs to specify both absolute and relative tolerances. If it's not too often, then the Catch2's approach will be less verbose. (And it doesn't require a C++20 feature to be readable). 2) I think the first impression when you see REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}}); is that it's equivalent to REQUIRE_THAT(value, within{expected, {.abs=1e-6}} and within{expected, {.rel=1e-3}});, i.e. to AND, not OR. This might confuse the reader.

cschreib commented 1 year ago

Thanks for the feedback!

  1. It's important to understand how often the user needs to specify both absolute and relative tolerances. If it's not too often, then the Catch2's approach will be less verbose.

My limited study so far seems to indicate that the most common need is to just have an absolute tolerance. And at work, we either use absolute tolerances, or both absolute and relative (with an OR combination). A relative tolerance on its own is usually impractical, because it can reduce to a ridiculously small tolerance for small numbers, and it usually ends up far lower than numerical precision.

In this case, we can be just a few character longer than Catch2

REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch

or a few character shorter if we add a special case for this:

REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch

I think this is the approach I am aiming for:

  1. Design a possibly verbose but generic interface that can do everything we want.
  2. Introduce as many specialized interfaces as needed to make the most common use cases easier / less verbose.

(And it doesn't require a C++20 feature to be readable)

That's true, but snitch being a C++20 library, I don't think it's a stretch to rely on this. Designated initializers are an awesome readability feature, we should use it and encourage it when it helps. Although a user could choose not to use them and end up with within{expected, {1e-6, 1e-3}} (which would be unreadable), a similar argument can be made in Python where function keyword-parameters can also be used as positional arguments. People just don't do it, because it hurts readability. I believe, as long as we have good examples and documentation, users would adopt the designated initializer syntax as a natural default.

  1. I think the first impression when you see REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}}); is that it's equivalent to REQUIRE_THAT(value, within{expected, {.abs=1e-6}} and within{expected, {.rel=1e-3}});, i.e. to AND, not OR. This might confuse the reader.

Yes I agree. I wanted to rename it to within_either to convey this, but then thought it looked odd if you don't specify one of the two: within_either{expected, {.abs=1e-6}}? Then we're also loosing on conciseness. I'll have to think about that one some more...

Ideas that came to mind while writing this; will have to get back to them later when I have time:

Generic within

The most generic concept is that we need a matcher that, for a given computed absolute tolerance, checks if two numbers are within that tolerance from each other. The tolerance is computed by a function supplied to the matcher:

template<std::floating_point T, typename Tolerance>
struct within {
    T expected = {};
    Tolerance tolerance = {};

    explicit within(T e, const Tolerance& tol) : expected{e}, tolerance{tol} {}

    constexpr bool match(T value) const noexcept {
        const T magnitude = std::max(std::abs(value), std::abs(expected));
        const T tol = get_abs_tolerance(magnitude, tolerance);
        return std::abs(value - expected) <= tol;
    }
};

With this generic matcher, anyone can supply their own Tolerance type for which there is an overload of T get_abs_tolerance(T magnitude, const Tolerance& tol);. This can be a fixed absolute tolerance, or a relative tolerance, or a combination (AND or OR) of the former two, or a weird domain-specific function.

We can define a special case for the most common use (absolute tolerance):

template<std::floating_point T>
T get_abs_tolerance(T /*value*/, T tol) { return tol; }

so that

REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch

Using abs and rel and operator overloading

For combining absolute and relative tolerances, we could create the Tolerance function like so

template<std::floating_point T>
struct abs_t { T tol{}; };

template<std::floating_point T>
struct rel_t { T tol{}; };

constexpr abs_t<double> operator ""_abs(long double value) { return {static_cast<double>(value)}; }
constexpr abs_t<float> operator ""_fabs(long double value) { return {static_cast<float>(value)}; }
constexpr rel_t<double> operator ""_rel(long double value) { return {static_cast<double>(value)}; }
constexpr rel_t<float> operator ""_frel(long double value) { return {static_cast<float>(value)}; }

template<std::floating_point T>
struct abs_and_rel_t { abs_t<T> abs{}; rel_t<T> rel{}; };

template<std::floating_point T>
struct abs_or_rel_t { abs_t<T> abs{}; rel_t<T> rel{}; };

template<std::floating_point T>
abs_and_rel_t<T> operator && (abs_t<T> abs, rel_t<T> rel) { return {abs, rel}; }
template<std::floating_point T>
abs_and_rel_t<T> operator && (rel_t<T> rel, abs_t<T> abs) { return {abs, rel}; }

template<std::floating_point T>
abs_or_rel_t<T> operator || (abs_t<T> abs, rel_t<T> rel) { return {abs, rel}; }
template<std::floating_point T>
abs_or_rel_t<T> operator || (rel_t<T> rel, abs_t<T> abs) { return {abs, rel}; }

template<std::floating_point T>
T get_abs_tolerance(T value [[maybe_unused]], abs_t<T> abs) { return abs.tol; }
template<std::floating_point T>
T get_abs_tolerance(T value, rel_t<T> rel) { return rel.tol * value; }
template<std::floating_point T>
T get_abs_tolerance(T value, abs_or_rel_t<T> tol) { 
    return std::max(get_abs_tolerance(value, tol.abs), get_abs_tolerance(value, tol.rel)); 
}
template<std::floating_point T>
T get_abs_tolerance(T value, abs_and_rel_t<T> tol) { 
    return std::min(get_abs_tolerance(value, tol.abs), get_abs_tolerance(value, tol.rel)); 
}

// Usage:
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6_abs}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel || 1e-6_abs}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel && 1e-6_abs}); // snitch

If the user-defined literals are too weird (or simply not practical if not dealing with literal tolerances; e.g., if the tolerance is computed at runtime), we can rename things a little and fall back to more regular syntax at the cost of some more verbosity (there might be issues with abs being a standard function though):

REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, abs{1e-6}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3} || abs{1e-6}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3} && abs{1e-6}}); // snitch
cschreib commented 1 year ago

Hmm indeed, abs{1e-6} won't compile because of the C abs() function in the global namespace. Why can't we have nice things? So then we'll need to drop the shorthand names:

REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, absolute{1e-6}}); // snitch
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch (shortcut, for most commonly used form)

REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3} || absolute{1e-6}}); // snitch

REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3} && absolute{1e-6}}); // snitch

In terms of verbosity, we've lost most of the advantages... I really like the concept of the generic within matcher in the previous post, with the injected logic for computing the tolerance. It "smells" like a good design. It just needs to be made to look good :)

Here's another example of what we could do with this, for the physicists among us. The following cannot be done with Catch2 builtin matchers, you would have to write your own:

// Adding overload for generic tolerance functions (this would be added to snitch)
template<typename T>
concept tolerance_function = requires(const T& func, double v) {
    { func(v) } -> std::convertible_to<double>;
};

template<floating_point T, tolerance_function F>
T get_abs_tolerance(T value, const F& func) {
    return func(value);
}

// Usage:
CHECK(1.001f == within{1.0f, [](auto v) { return std::sqrt(1e-12 + v * v * 1e-6); }});

// Or in a more real-world setting:
const auto computed_mass = ...;
const auto expected_mass = ...;
const auto mass_error = [](auto v) { return std::sqrt(1e-12 + v * v * 1e-6); };
CHECK(computed_mass == within{expected_mass, mass_error});

The one thing I'm not sure of is the computation of the "magnitude" as std::max(std::abs(value), std::abs(expected)). This is done so that relative errors are symmetric, which was one of the quoted "issues" with the deprecated Catch2 Approx:

REQUIRE_THAT(1.0f, within(1.1f, 0.1f));
REQUIRE_THAT(1.1f, within(1.0f, 0.1f)); // identical as line above

I'm not sure this is actually a desirable property. The expected value is the one that matters for setting the magnitude of the tolerance, regardless of what value was actually measured. I might change this to std::abs(expected) instead, or even just expected (and let the tolerance function apply any abs(...) if necessary).

cschreib commented 1 year ago

A brief review of floating point testing practices in popular open source libraries:

cschreib commented 1 year ago

Parking this for a bit; the latest prototype can be found in https://github.com/cschreib/snitch/tree/float_matcher.

horenmar commented 7 months ago

The one thing I'm not sure of is the computation of the "magnitude" as std::max(std::abs(value), std::abs(expected)). This is done so that relative errors are symmetric, which was one of the quoted "issues" with the deprecated Catch2 Approx: ... I'm not sure this is actually a desirable property.

It is a desirable property because it is an expected property of the comparison when dealing with numerical comparisons in literature. This is not the same as being intuitive and I agree that Approx approach is the more intuitive one... it is however likely to introduce incorrectness into the tests.

You might also be interested in this note about the implementation of floating point comparisons in Catch2


doctest uses Approx, I believe it is the same as the one in Catch2.

It is a copy of old version of Approx, which had some weird ideas (no margin, non-zero scale default, etc)

cschreib commented 7 months ago

It is a desirable property because it is an expected property of the comparison when dealing with numerical comparisons in literature. This is not the same as being intuitive and I agree that Approx approach is the more intuitive one... it is however likely to introduce incorrectness into the tests.

You might also be interested in this note about the implementation of floating point comparisons in Catch2

Thanks for the link, there's some very useful information in there! As for the symmetric property being desirable or not, I think it depends on the question being asked:

  1. "Are these numbers close to each other?" -> symmetric
  2. "Is this value close to the expected value?" -> not symmetric

I genuinely think these are different questions to ask. For example, I would ask question 1 if my test is computing values using two different methods, and checking that the values match; there is not one value that is more important or more correct than the other, I just need to compare them. Question 2, on the other hand, I would ask when I have a "ground truth" to compared against. Maybe it's from a specification, a pen-and-paper calculation, some theoretical optimum, or a naive but accurate algorithm. IMO question 2 is more common, but that might just reflect my particular field.

Here are a few graphs to highlight what this does in practice when comparing a "measured" value against an "expected" value with various expressions for the relative difference:

This is when expected is fixed to 4, and we vary measured: image

This is when measured is fixed to 4 and we vary expected: image

By design, the white and orange lines are the same on both figures, because measured and expected are interchangeable (the expression is symmetric). However, the first graph also shows that, for these two methods, the acceptable error is not symmetric; if we set an acceptable threshold of 0.4 for the error, the range of acceptable values around expected=4 is about 2.4 to 6.7, which means an acceptable error of [-1.6, +2.7]. For "sym(min)" we get a similar behaviour, albeit with generally lower acceptable error ([-1.1, +1.6]). In contrasts with the "asym" method, the acceptable error is symmetric: [-1.6, +1.6]. I think that's what makes it a more intuitive solution, at least when asking for "question 2" above.

It's worth noting however that, for a low value of the acceptable threshold, say 0.01, the range of acceptable error quickly converges to be the same for all three methods.

cschreib commented 7 months ago

So I think it all boils down to terminology; perhaps Approx was surprising because of its name, which reminds of the mathematical expression "x is approximately equal to y" (which one expects to be symmetric; x ~= y <=> y ~= x). But I think Within, as a word, doesn't have this baggage: it expresses "x must be within some range around y" (which doesn't sound, at least to me!, that it has to be symmetric).

Edit: As an example of this argument: "x must be within +/- 5% of y" --> "+/- 5% of y" should unambiguously evaluate to an acceptable error range of [-0.05*abs(y), +0.05*abs(y)] (which is what the "asym" method does); it should not depend on x.