Open StephanTLavavej opened 9 months ago
- Enumeration types
enum fruit { apple, banana, lemon, pear, plum };
bool operator<(fruit left, fruit right)
{
if (right == apple || left == apple)
return true;
return (int)left < (int) right;
}
- Floating-point types
less(1.0, nan) = false less(nan, 5.0) = false less(1.0, 5.0) = true
Seems related to #1006. For "known" comparators and element types, we can conformingly perform additional checks in clamp
etc. because such checks are side-effect-free.
...though the quoted test doesn't check that the given less-than is transitive; it only checks that it's irreflexive and asymmetric, which floats are indeed.
As long as that verification code ignores transitiveness, floats can be considered well behaved.
There's no even way to check for this transitiveness of floats without false positives: in an array of only nans all comparisons are transitive
The check is basically looking for bogus less-equal behavior (common among novices), although it does sometimes catch bogus lexicographic comparisons by people who don't know the pattern.
I put floating-point back with an extended explanation, thanks!
We talked about this at the weekly maintainer meeting and we're in favor of the approach that I proposed, including limiting this to just the suggested comparators and types (yes to basic_string
and basic_string_view
as mentioned, no to pair
/tuple
and anything more complicated, at least at first). Implementing this with some kind of internal type trait seems like the way to go, but we definitely don't want to over-engineer some user-documented mechanism (again, at least at first).
I'm skeptical on this being useful enough to do it.
The optimization would apply with iterator debugging. This mode is usually active together with disabled optimization. In this case extra comparison would not be noticeable among other suboptimalities, except maybe for strings.
The mode with optimizations but with debugging checks is rare, and even this is not primary optimization target (recall what is vector in this case). Currently it is to a large extent superseded with normal optimized release with debug info and ASan on.
On the other side of the scales we have predicate complexity. Sure, we do at least this complexity things for vector algorithms, even more complex for find
family and lexicographical_compare
family, but we sometimes more than 10x gain, which covers usual release modes. There's some overlap with this and minmax
optimization, but it doesn't look close enough to share a part of machinery.
Another approach, without a complex predicate, could be a way to query the compiler if the operator<
is built-in or is synthesized from built-in ones. This would exclude strings, but on the other hand we can add some tuples, if we can synthesize the operators instead of having them spelled out.
Weighting this, I'd lean towards not doing anything here.
_Debug_lt_pred()
detects bogus user-provided comparators, e.g. less-than-or-equal behavior, or those who tried to implement tuple-like comparison but got it wrong (this is tricky for people who haven't learned to recognize the pattern):https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xutility#L1391-L1402
This debug check is very valuable, but we always perform it ("when the arguments are the cv-same-type"), even when we could statically detect that the predicate and arguments are good. This is likely because metaprogramming was very difficult for us historically (this check predates
<type_traits>
and definitelyif constexpr
), and tag dispatch was expensive in debug mode, so doing anything would have been counterproductive. Now it should be easier.I believe that an exhaustive list of the "known predicates" is:
less<T>
whereT
is a "known type"less<>
ranges::less
greater<T>
whereT
is a "known type"greater<>
ranges::greater
We can't trust
less<InvolvesUserProvidedType>
because it could call a user-providedoperator<
or even be directly specialized by the user (specializingless
is one of the few Standard Library types where this is actually done in practice).As for "known types", we know what will happen if we compare many types with
operator<
. This is not an exhaustive list:_Debug_lt_pred()
is only checking whenpred(x, y)
andpred(y, x)
are simultaneouslytrue
(i.e. looking for less-equal behavior). NaNs can't cause that, so we may as well skip this check for floating-point types.basic_string<Elem, char_traits<Elem>, AnyAlloc>
Elem
is any ofchar
,wchar_t
,char8_t
,char16_t
,char32_t
basic_string_view<Elem, char_traits<Elem>>
Because this would only be a debug perf improvement, I think we should limit ourselves to how many Standard Library types we detect. We know how
vector<int>
comparisons will behave, but whilevector
is popular, how often is it being given to_Debug_lt_pred()
? I could see a case forpair<Known1, Known2>
, maybetuple<Known...>
, especially because saving debug checks there is potentially a bigger win. We should think some more about what types we want to handle, before writing code.We should perhaps think of an extensible system, e.g. an internal type trait that
basic_string
andbasic_string_view
specialize, so that if we do wish to extend the known types later, we can do so more easily. I'm much less interested in over-engineering something that users can extend (whether for their predicates or their types), as that starts to sound like complexity we would need to support.Note that
less<T>
doesn't need to exactly match its argument types to have known behavior.less<int>
comparing twoshort
s is going to have known results. I think it would be sufficient to require that all of the types involved be known, but not require any relationships between the types, before we skip the debug check.