nemequ / hedley

A C/C++ header to help move #ifdefs out of your code
https://nemequ.github.io/hedley/
Creative Commons Zero v1.0 Universal
774 stars 51 forks source link

Macro for GCC 9's new __builtin_expect_with_probability #15

Closed nemequ closed 5 years ago

nemequ commented 5 years ago

It looks like GCC 9 will have a new builtin which I'm rather excited about, __builtin_expect_with_probability(long exp, long c, double probability):

This function has the same semantics as __builtin_expect, but the caller provides the expected probability that exp == c. The last argument, probability, is a floating-point value in the range 0.0 to 1.0, inclusive. The probability argument must be constant floating-point expression.

I'm trying to figure out the best way to expose this in Hedley. It's going to require more thought, and lots of testing, but right now I'm thinking about something like:

#if HEDLEY_GNUC_HAS_BUILTIN(__builtin_expect_with_probability,9,0,0)
#  define HEDLEY_PREDICT(expr, value, probability) \
      __builtin_expect_with_probability(expr, value, probability)
#elif \
  HEDLEY_GNUC_HAS_BUILTIN(__builtin_expect,3,0,0) || \
  HEDLEY_INTEL_VERSION_CHECK(16,0,0) || \
  HEDLEY_SUNPRO_VERSION_CHECK(5,12,0) || \
  HEDLEY_ARM_VERSION_CHECK(4,1,0) || \
  HEDLEY_IBM_VERSION_CHECK(10,1,0) || \
  HEDLEY_TI_VERSION_CHECK(7,3,0) || \
  HEDLEY_TINYC_VERSION_CHECK(0,9,27)
#  define HEDLEY_PREDICT(expr, value, probability) \
      (((probability) >= 0.9) ? __builtin_expect((expr), (value)) : (expr))
#else
#  define HEDLEY_PREDICT(expr, value, probability) (expr)
#endif

probability has to be constant, so the compiler should eliminate the test., but I definitely want to verify that to ensure we don't bloat the compiled code. It might also be nice to go through a temporary constexpr double if compiling in C++17 mode with compilers other than GCC 9+ which support statement exprs.

I'm not completely satisfied with the API, either. When I want to use __builtin_expect it's almost always to check boolean values. It might be nice to have some macros which assume bools, like the current HEDLEY_LIKELY/HEDLEY_UNLIKELY macros (which don't have the probability parameter). I need to think of new names, but something like

#define HEDLEY_LIKELY(expr, probability) HEDLEY_PREDICT(!!(expr), 1, probability)
#define HEDLEY_UNLIKELY(expr, probability) HEDLEY_PREDICT(!!(expr), 0, probability)

It should also be possible to use this to use this to reverse a test for low probability items when using __builtin_expect, so HEDLEY_LIKELY(expr, 0.01) would become __builtin_expect(!!(expr), 0) instead of !!(expr).

If anyone has any thoughts and/or suggestions, please speak up!

nemequ commented 5 years ago

I just pushed 7646ce7ea2e1ede712f6a8a9ba9abe967ae5da98 to dev for testing with the names HEDLEY_PREDICT, HEDLEY_PREDICT_TRUE, and HEDLEY_PREDICT_FALSE, but I'm still very open to other ideas.

travisdowns commented 5 years ago

I definitely like the idea of a specialized boolean variant. My current use is 100% for indicating likely branch direction, so boolean use. Hedley would be a nice place to put those details, especially if other mechanisms pop up in the future which are branch specific.

Out of curiosity, are you aware of good uses outside of a branch context? There are plenty of use for "assume" type calls which can tell the compiler that a value has a restricted range, and so affects subsequent code generation, but the case is less clear for "expect" which makes no guarantees. Maybe for cases where knowing the value would allow prioritization of the most common values (switch table generation, etc?).

travisdowns commented 5 years ago

About choosing the threshold for emulation of _expect_prob with plain _expect, which is 0.9 in the above code, it might be good to check what the implied threshold is in gcc for the plain __builtin_expect and align it with that (or maybe you already did)?

I don't have a totally clear mental model of how this works, but for binary decisions a threshold of 0.50 might be better: if you have a branch that goes the same way 80% of the time you want to compile for that case. That is, the may the way it works for branches was that it's more or less a 2-way choice: the compiler just decides which direction is more likely (so 0.49 goes one way and 0.51 the other), and then compiles favoring the more likely way. However, it may also be a 3-way choice: strongly taken, strongly not-taken, and "mushy middle", and in the mushy middle case the compiler is more likely to favor, for example, branch-free code. I am not sure...

That actually raises the key use I would expect of expect_probability - that it allows you to generate cmov-based branch-free code even in cases where the heuristics would usually say it's not worth it. I'll play with it once gcc-9 appears on godbolt.

jibsen commented 5 years ago

Out of curiosity, are you aware of good uses outside of a branch context? There are plenty of use for "assume" type calls which can tell the compiler that a value has a restricted range, and so affects subsequent code generation, but the case is less clear for "expect" which makes no guarantees. Maybe for cases where knowing the value would allow prioritization of the most common values (switch table generation, etc?).

Clang has __builtin_assume which lets you tell the compiler invariants on your data, MSVC has the similar __assume. I don't think GCC has something like that yet, there are some attempts to do it by calling __builtin_unreachable, if the condition is not true, but I don't know if it actually works.

travisdowns commented 5 years ago

Clang has __builtin_assume which lets you tell the compiler invariants on your data, MSVC has the similar __assume.

Yes, but that's exactly my point: this "assume" thing has a lot of uses, because you basically inject "assumed correct" information into the various analyses of the compiler (e.g., value propagation), and so the compiler can generate code based on these proven facts just as if it had proved the fact itself. The "expect" is different: it's just soft information, a hint about which way things usually go. The compiler has to generate code that works if the expectation is violated, but it may choose to make that code slower.

My question is, given that, in which cases is non-binary expect useful? The binary branch case is the big obvious use-case but I'm interested in the non-binary use cases. Having a handle on them may also help understand best how to expose them in Hedley.

One big use case for me is if I'm able to write __expect_probability(!!(expr), 1, 0.5) to indicate an unpredictable branch and have the compiler prefer cmov - but this is still a binary case, talking about branch direction.

I don't think GCC has something like that yet, there are some attempts to do it by calling __builtin_unreachable, if the condition is not true, but I don't know if it actually works.

Yes, it works. It's an indirect approach but it achieves the same effect. gcc back-propagates knowledge about unreachable paths (including paths that would invoke UB) back to earlier code, so it has a similar effect. It is probably also why they haven't introduced assume themselves, since this provides a good workaround.

jibsen commented 5 years ago

My question is, given that, in which cases is non-binary expect useful? The binary branch case is the big obvious use-case but I'm interested in the non-binary use cases. Having a handle on them may also help understand best how to expose them in Hedley.

For instance switch statements, where you can __builtin_expect the most likely case.

nemequ commented 5 years ago

Out of curiosity, are you aware of good uses outside of a branch context? There are plenty of use for "assume" type calls which can tell the compiler that a value has a restricted range, and so affects subsequent code generation, but the case is less clear for "expect" which makes no guarantees. Maybe for cases where knowing the value would allow prioritization of the most common values (switch table generation, etc?).

Yeah, I've used __builtin_expect in a switch, but only to say what the most common case was, not fully order them. IIRC recent versions of GCC also allow hot/cold attributes on case labels.

About choosing the threshold for emulation of _expect_prob with plain _expect, which is 0.9 in the above code, it might be good to check what the implied threshold is in gcc for the plain __builtin_expect and align it with that (or maybe you already did)?

Strangely enough, I was initially using 0.9 as a placeholder until I looked into it and found the correct value. Then I noticed that the documentation for __builtin_expect in GCC 9 says 0.9. Of course, I can't promise that older versions of GCC assumed 0.9, that might be worth looking in to…

One big use case for me is if I'm able to write __expect_probability(!!(expr), 1, 0.5) to indicate an unpredictable branch and have the compiler prefer cmov - but this is still a binary case, talking about branch direction.

That's interesting. I'm not sure if GCC will support it right away (I don't have GCC 9 to test with), but you can definitely file a bug about that if it doesn't.

I'm not sure how focused we should be on what GCC does right now instead of what it could do with this information. IMHO if there is somewhere you could imagine using __builtin_expect_with_probability to convey information to the compiler we should try to cater to that use case. Hedley just tries to make the information available to the compilers so they /can/ take advantage; even if a current version doesn't maybe a future version (or another compiler) will.

I'm going to open a bug about a HEDLEY_ASSUME macro, too. I'd been relying on HEDLEY_UNREACHABLE (which is assume(0) on MSVC), but after reading this I think there is probably an appetite to break it out into its own.

travisdowns commented 5 years ago

I'm not sure how focused we should be on what GCC does right now instead of what it could do with this information.

Yes, I that part wasn't really about how to design hedley's integration here, it was more just my personal curiosity about how this new builtin works today, since it's something I'll use, probably with Hedley, but I want to know how the underlying thing works. So you can definitely consider it off topic.

travisdowns commented 5 years ago

(I don't have GCC 9 to test with)

"Trunk" on godbolt uses some version of gcc 9 and it supports __builtin_expect_with_p. Example.

nemequ commented 5 years ago

Excellent, thanks! This does bring up an interesting problem, though. __builtin_expect() generates the same code as your superlikely (probability ≥ 0.98) example in GCC < 9, so it seems like they've effectively changed the probability used for __builtin_expect() down from something like 0.98 to 0.9 in GCC 9, which is going to mess up a lot of code.

I'm wondering if maybe we should keep the old probabilities in Hedley by changing HEDLEY_LIKELY/HEDLEY_UNLIKELY over to __builtin_expect(!!(expr), X, 0.99) in GCC 9+, or just accept that GCC devs know best…

travisdowns commented 5 years ago

I don't think it's that clear: for this example gcc 7 and 8 using likely generates the same code (branch-using) as superlikely in 9, but versions 4, 5 and 6 don't (branch-free cmov). So maybe just 7 and 8 are weird. Also 7 and 8 seem to not be doing the right thing in terms of ordering the basic blocks for likely and unlikely: they both produce the same result (but you'd expect them to be swapped for the unlikely case).

I also wouldn't assume it's as simple as gcc having changed the implied branch probability in 9: the resulting assembly is the result of a lot of moving parts beyond just the probability and without digging into internals its hard to say exactly why the code was generated that way.

All that to say that I don't think Hedley can "solve" the problem of generating the right assembly or consistent assembly in every case for something like HEDLEY_LIKELY. Given that, I think the principle of least surprise would be just to emit likely for gcc 9+ as a user would probably expect: people who really care will have to check the assembly and tune by hand if they want - and I don't think you can avoid this by being clever here.

nemequ commented 5 years ago

Thanks, that saves me a lot of research. Hopefully now that the probability is defined for expect things will stabilize a bit.

I guess all that's left is to think about the names some more, and let it ferment until GCC is at least closer to a final release.

I've already tested the implementation on a bunch of compilers, just need to fire up the VMs for a few others.

nemequ commented 5 years ago

@travisdowns: I just added a HEDLEY_UNPREDICTABLE macro to the dev branch. On GCC it's a shortcut for HEDLEY_PREDICT(expr, 1, 0.5), but for clang it's __builtin_unpredictable(!!(expr)). Does that suit you?

travisdowns commented 5 years ago

@nemequ - yeah I think it makes sense. The big wildcard is how __builtin_predict_with_probability actually works (because that's the only time HEDLEY_PREDICT(_, _, 0.5) actually hints anything to the compiler, right?) - i.e., what it implies about code generation on the versions of gcc that support it. Still this strikes me as the right think to do given the tools we have, and perhaps at some point later we'll have enough info to really evaluate the impact of __builtin_predict_with_probability.

nemequ commented 5 years ago

v9 released with these macros included.

travisdowns commented 4 years ago

Interesting thread on a possible use for *with_probability. Basically the usual __builtin_expect can cause sometimes undesirable codegen changes after it is "stacked" a few times, because the cumulative probability gets so low that the compiler treats it as cold code. Using with_probability with a moderate probability like 0.51 can work around this.

nemequ commented 4 years ago

Hah, "likleyish" :)

Yeah, that's definitely an interesting use case. Maybe I should add it to the documentation for HEDLEY_PREDICT? Would you mind if I linked to something like https://godbolt.org/z/ddm6_7 to help illustrate the point?

I've been thinking it would be really nice to have a bit more explanation in the docs instead of just pointing people to the compiler documentation and hoping they understand…

travisdowns commented 4 years ago

Would you mind if I linked to something like https://godbolt.org/z/ddm6_7 to help illustrate the point?

Unconditionally and forever, you may link to any godbolt I write (or a modification thereof), without attribution.

I've been thinking it would be really nice to have a bit more explanation in the docs instead of just pointing people to the compiler documentation and hoping they understand…

Definitely! I don't know if it should be inline in the API doc? I personally like stuff like that there, but you may not if you think it crufts up the API part.

One thing I do feel is missing a bit is how these things map to the underlying, especially in cases where the mapping is non-obvious: e.g., how "predict" works on the majority of platforms that don't implement predict. I checked the source in this case, but sometimes that's a bit tricky given the obligatory macro soup.

nemequ commented 4 years ago

Unconditionally and forever, you may link to any godbolt I write (or a modification thereof), without attribution.

Thanks :). I'll make sure to provide attribution if I'm using it in docs, unless of course you'd rather avoid any blame.

Definitely! I don't know if it should be inline in the API doc? I personally like stuff like that there, but you may not if you think it crufts up the API part.

I completely agree, at least for a project like Hedley where details like that are really core to the project. I've only held off on that out of laziness and lack of time.

Now that I'm thinking about it I really like the idea of using Compiler Explorer to help show the effects of various macros. I had previously dismissed the idea since it's not feasible to include Hedley, but just showing what happens in GCC or clang should be enough to help people understand the ideas even if I'm not actually using Hedley.

One thing I do feel is missing a bit is how these things map to the underlying, especially in cases where the mapping is non-obvious: e.g., how "predict" works on the majority of platforms that don't implement predict. I checked the source in this case, but sometimes that's a bit tricky given the obligatory macro soup.

Yeah, that's a good point.

Most of the macros are nowhere near as gnarly as the likely/unlikely/predict group. The only ones I can think of that even come close is the unreachable/assume group, so it shouldn't be too hard put together explanations for those macros.

I also need to go through and verify that the existing implementation data tables are accurate; I have a terrible habit of not updating the web site when I make improvements to Hedley.

I'm in the middle of some major changes in SIMDe right now, but I think once that's done I'll hop back over to Hedley and work on this a bit. Give me a couple weeks and I should have something… Being unemployed is fantastic for productivity on hobby projects :)