Open cschreib opened 1 year ago
Perhaps something like:
template<typename T>
struct within {
T expected = {};
struct tolerance_thresholds {
T abs = {};
T rel = {};
} tolerance;
explicit within(T e, T abs_tol) : expected(e), tolerance{.abs=abs_tol} {}
explicit within(T e, const tolerance_thresholds& tol) : expected{e}, tolerance{tol} {}
constexpr bool match(T value) const noexcept {
const T tol = std::max(tolerance.abs, tolerance.rel * std::max(std::abs(value), std::abs(expected)));
return std::abs(value - expected) <= tol;
}
};
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_NEAR(value, expected, 1e-6); // GoogleTest
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, {.rel=1e-3}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
// Until we implement combining matchers:
REQUIRE_THAT(value, within{expected, {.rel=1e-3}}); // snitch
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch
// When we implement combining matchers:
REQUIRE_THAT(value, within{expected, {.rel=1e-3}} && within{expected, {.abs=1e-6}}); // snitch
Here are two concerns about the proposed design:
1) It's important to understand how often the user needs to specify both absolute and relative tolerances. If it's not too often, then the Catch2's approach will be less verbose. (And it doesn't require a C++20 feature to be readable).
2) I think the first impression when you see REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}});
is that it's equivalent to REQUIRE_THAT(value, within{expected, {.abs=1e-6}} and within{expected, {.rel=1e-3}});
, i.e. to AND, not OR. This might confuse the reader.
Thanks for the feedback!
- It's important to understand how often the user needs to specify both absolute and relative tolerances. If it's not too often, then the Catch2's approach will be less verbose.
My limited study so far seems to indicate that the most common need is to just have an absolute tolerance. And at work, we either use absolute tolerances, or both absolute and relative (with an OR combination). A relative tolerance on its own is usually impractical, because it can reduce to a ridiculously small tolerance for small numbers, and it usually ends up far lower than numerical precision.
In this case, we can be just a few character longer than Catch2
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, {.abs=1e-6}}); // snitch
or a few character shorter if we add a special case for this:
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch
I think this is the approach I am aiming for:
(And it doesn't require a C++20 feature to be readable)
That's true, but snitch being a C++20 library, I don't think it's a stretch to rely on this. Designated initializers are an awesome readability feature, we should use it and encourage it when it helps. Although a user could choose not to use them and end up with within{expected, {1e-6, 1e-3}}
(which would be unreadable), a similar argument can be made in Python where function keyword-parameters can also be used as positional arguments. People just don't do it, because it hurts readability. I believe, as long as we have good examples and documentation, users would adopt the designated initializer syntax as a natural default.
- I think the first impression when you see REQUIRE_THAT(value, within{expected, {.abs=1e-6, .rel=1e-3}}); is that it's equivalent to REQUIRE_THAT(value, within{expected, {.abs=1e-6}} and within{expected, {.rel=1e-3}});, i.e. to AND, not OR. This might confuse the reader.
Yes I agree. I wanted to rename it to within_either
to convey this, but then thought it looked odd if you don't specify one of the two: within_either{expected, {.abs=1e-6}}
? Then we're also loosing on conciseness. I'll have to think about that one some more...
Ideas that came to mind while writing this; will have to get back to them later when I have time:
within
The most generic concept is that we need a matcher that, for a given computed absolute tolerance, checks if two numbers are within that tolerance from each other. The tolerance is computed by a function supplied to the matcher:
template<std::floating_point T, typename Tolerance>
struct within {
T expected = {};
Tolerance tolerance = {};
explicit within(T e, const Tolerance& tol) : expected{e}, tolerance{tol} {}
constexpr bool match(T value) const noexcept {
const T magnitude = std::max(std::abs(value), std::abs(expected));
const T tol = get_abs_tolerance(magnitude, tolerance);
return std::abs(value - expected) <= tol;
}
};
With this generic matcher, anyone can supply their own Tolerance
type for which there is an overload of T get_abs_tolerance(T magnitude, const Tolerance& tol);
. This can be a fixed absolute tolerance, or a relative tolerance, or a combination (AND or OR) of the former two, or a weird domain-specific function.
We can define a special case for the most common use (absolute tolerance):
template<std::floating_point T>
T get_abs_tolerance(T /*value*/, T tol) { return tol; }
so that
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch
abs
and rel
and operator overloadingFor combining absolute and relative tolerances, we could create the Tolerance
function like so
template<std::floating_point T>
struct abs_t { T tol{}; };
template<std::floating_point T>
struct rel_t { T tol{}; };
constexpr abs_t<double> operator ""_abs(long double value) { return {static_cast<double>(value)}; }
constexpr abs_t<float> operator ""_fabs(long double value) { return {static_cast<float>(value)}; }
constexpr rel_t<double> operator ""_rel(long double value) { return {static_cast<double>(value)}; }
constexpr rel_t<float> operator ""_frel(long double value) { return {static_cast<float>(value)}; }
template<std::floating_point T>
struct abs_and_rel_t { abs_t<T> abs{}; rel_t<T> rel{}; };
template<std::floating_point T>
struct abs_or_rel_t { abs_t<T> abs{}; rel_t<T> rel{}; };
template<std::floating_point T>
abs_and_rel_t<T> operator && (abs_t<T> abs, rel_t<T> rel) { return {abs, rel}; }
template<std::floating_point T>
abs_and_rel_t<T> operator && (rel_t<T> rel, abs_t<T> abs) { return {abs, rel}; }
template<std::floating_point T>
abs_or_rel_t<T> operator || (abs_t<T> abs, rel_t<T> rel) { return {abs, rel}; }
template<std::floating_point T>
abs_or_rel_t<T> operator || (rel_t<T> rel, abs_t<T> abs) { return {abs, rel}; }
template<std::floating_point T>
T get_abs_tolerance(T value [[maybe_unused]], abs_t<T> abs) { return abs.tol; }
template<std::floating_point T>
T get_abs_tolerance(T value, rel_t<T> rel) { return rel.tol * value; }
template<std::floating_point T>
T get_abs_tolerance(T value, abs_or_rel_t<T> tol) {
return std::max(get_abs_tolerance(value, tol.abs), get_abs_tolerance(value, tol.rel));
}
template<std::floating_point T>
T get_abs_tolerance(T value, abs_and_rel_t<T> tol) {
return std::min(get_abs_tolerance(value, tol.abs), get_abs_tolerance(value, tol.rel));
}
// Usage:
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-6_abs}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel || 1e-6_abs}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, 1e-3_rel && 1e-6_abs}); // snitch
If the user-defined literals are too weird (or simply not practical if not dealing with literal tolerances; e.g., if the tolerance is computed at runtime), we can rename things a little and fall back to more regular syntax at the cost of some more verbosity (there might be issues with abs
being a standard function though):
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, abs{1e-6}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3} || abs{1e-6}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, rel{1e-3} && abs{1e-6}}); // snitch
Hmm indeed, abs{1e-6}
won't compile because of the C abs()
function in the global namespace. Why can't we have nice things? So then we'll need to drop the shorthand names:
REQUIRE_THAT(value, WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, absolute{1e-6}}); // snitch
REQUIRE_THAT(value, within{expected, 1e-6}); // snitch (shortcut, for most commonly used form)
REQUIRE_THAT(value, WithinRel(expected, 1e-3)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) || WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3} || absolute{1e-6}}); // snitch
REQUIRE_THAT(value, WithinRel(expected, 1e-3) && WithinAbs(expected, 1e-6)); // Catch2
REQUIRE_THAT(value, within{expected, relative{1e-3} && absolute{1e-6}}); // snitch
In terms of verbosity, we've lost most of the advantages... I really like the concept of the generic within
matcher in the previous post, with the injected logic for computing the tolerance. It "smells" like a good design. It just needs to be made to look good :)
Here's another example of what we could do with this, for the physicists among us. The following cannot be done with Catch2 builtin matchers, you would have to write your own:
// Adding overload for generic tolerance functions (this would be added to snitch)
template<typename T>
concept tolerance_function = requires(const T& func, double v) {
{ func(v) } -> std::convertible_to<double>;
};
template<floating_point T, tolerance_function F>
T get_abs_tolerance(T value, const F& func) {
return func(value);
}
// Usage:
CHECK(1.001f == within{1.0f, [](auto v) { return std::sqrt(1e-12 + v * v * 1e-6); }});
// Or in a more real-world setting:
const auto computed_mass = ...;
const auto expected_mass = ...;
const auto mass_error = [](auto v) { return std::sqrt(1e-12 + v * v * 1e-6); };
CHECK(computed_mass == within{expected_mass, mass_error});
The one thing I'm not sure of is the computation of the "magnitude" as std::max(std::abs(value), std::abs(expected))
. This is done so that relative errors are symmetric, which was one of the quoted "issues" with the deprecated Catch2 Approx
:
REQUIRE_THAT(1.0f, within(1.1f, 0.1f));
REQUIRE_THAT(1.1f, within(1.0f, 0.1f)); // identical as line above
I'm not sure this is actually a desirable property. The expected value is the one that matters for setting the magnitude of the tolerance, regardless of what value was actually measured. I might change this to std::abs(expected)
instead, or even just expected
(and let the tolerance function apply any abs(...)
if necessary).
A brief review of floating point testing practices in popular open source libraries:
CHECK(allclose(a, b))
. This reduces to abs(a - b) <= abs_tol + v * rel_tol
, a form similar to the above, not available with builtin Catch2 or doctest matchers.Approx(x)
or Approx(x).margin(y)
, hence relative tolerance.a.isApprox(b)
. For scalars, this reduces to abs(a - b) <= rel_tol * min(abs(a), abs(b))
, so similar to WithinRel
but using min
instead of max
for the relative tolerance magnitude.EXPECT_NEAR(a, b, tol)
, EXPECT_TRUE(abs(a-b)<tol)
, and more occasionally EXPECT_LT(norm, tol)
with norm
computed as L1 absolute norm or L2 relative norm. There's a bit of everything, although mostly absolute tolerances.Parking this for a bit; the latest prototype can be found in https://github.com/cschreib/snitch/tree/float_matcher.
The one thing I'm not sure of is the computation of the "magnitude" as std::max(std::abs(value), std::abs(expected)). This is done so that relative errors are symmetric, which was one of the quoted "issues" with the deprecated Catch2 Approx: ... I'm not sure this is actually a desirable property.
It is a desirable property because it is an expected property of the comparison when dealing with numerical comparisons in literature. This is not the same as being intuitive and I agree that Approx
approach is the more intuitive one... it is however likely to introduce incorrectness into the tests.
doctest uses Approx, I believe it is the same as the one in Catch2.
It is a copy of old version of Approx, which had some weird ideas (no margin, non-zero scale default, etc)
It is a desirable property because it is an expected property of the comparison when dealing with numerical comparisons in literature. This is not the same as being intuitive and I agree that
Approx
approach is the more intuitive one... it is however likely to introduce incorrectness into the tests.
Thanks for the link, there's some very useful information in there! As for the symmetric property being desirable or not, I think it depends on the question being asked:
I genuinely think these are different questions to ask. For example, I would ask question 1 if my test is computing values using two different methods, and checking that the values match; there is not one value that is more important or more correct than the other, I just need to compare them. Question 2, on the other hand, I would ask when I have a "ground truth" to compared against. Maybe it's from a specification, a pen-and-paper calculation, some theoretical optimum, or a naive but accurate algorithm. IMO question 2 is more common, but that might just reflect my particular field.
Here are a few graphs to highlight what this does in practice when comparing a "measured" value against an "expected" value with various expressions for the relative difference:
abs(measured - expected)/(max(measured, expected) + 1e-6)
(white solid line), abs(measured - expected)/(min(measured, expected) + 1e-6)
(orange solid line), abs(measured - expected)/(abs(expected) + 1e-6)
(blue dashed line).This is when expected
is fixed to 4
, and we vary measured
:
This is when measured
is fixed to 4
and we vary expected
:
By design, the white and orange lines are the same on both figures, because measured
and expected
are interchangeable (the expression is symmetric). However, the first graph also shows that, for these two methods, the acceptable error is not symmetric; if we set an acceptable threshold of 0.4
for the error, the range of acceptable values around expected=4
is about 2.4
to 6.7
, which means an acceptable error of [-1.6, +2.7]
. For "sym(min)" we get a similar behaviour, albeit with generally lower acceptable error ([-1.1, +1.6]
). In contrasts with the "asym" method, the acceptable error is symmetric: [-1.6, +1.6]
. I think that's what makes it a more intuitive solution, at least when asking for "question 2" above.
It's worth noting however that, for a low value of the acceptable threshold, say 0.01
, the range of acceptable error quickly converges to be the same for all three methods.
So I think it all boils down to terminology; perhaps Approx
was surprising because of its name, which reminds of the mathematical expression "x is approximately equal to y" (which one expects to be symmetric; x ~= y
<=> y ~= x
). But I think Within
, as a word, doesn't have this baggage: it expresses "x must be within some range around y" (which doesn't sound, at least to me!, that it has to be symmetric).
Edit: As an example of this argument: "x must be within +/- 5% of y" --> "+/- 5% of y" should unambiguously evaluate to an acceptable error range of [-0.05*abs(y), +0.05*abs(y)]
(which is what the "asym" method does); it should not depend on x
.
Catch2 offers some builtin float matchers to compare two floating point numbers for approximated equality. This seems to be a popular feature:
It could be easy to include without bloating snitch.
As stated in the discussions linked above, we first need to understand what people are generally trying to achieve. There are a lot of misconceptions around floating point numbers, and some traps that are easy to fall into.. Something too simple might not fit anybody's needs. We could also quickly over-engineer this in order to make it generally useful to everyone, and we should try to avoid that.
Catch2 used to offer the
Approx
matcher, but this was eventually deprecated in favor of theWithinAbs
/WithinRel
/WithinULP
matchers. If we are not going to just port over the new Catch2 matchers, we should learn from this. The weaknesses ofApprox
were:double
internally no matter the type of the input.a == Approx(b)
isn't the same asb == Approx(a)
).The new matchers presumably fix all this, however they rely on combining matchers to work nicely, so we would have to implement this as well:
I think this is a nice idea, it allows specifying pretty much all combinations of tolerances one may need. However it is a bit verbose, we can probably improve on that.
doctest uses
Approx
, I believe it is the same as the one in Catch2.GoogleTest uses
EXPECT_NEAR
, which is just an absolute tolerance check. They have various matchers although they appear to only support absolute tolerances.