Open Pzixel opened 4 years ago
Pattern matching is used everywhere in the tests when it's possible, and the average developer would do the same - they won't do micro-optimizations unless necessary. I'm not even sure that clippy
won't complain about a bunch of if let
instead of match
, so we may assume that match
would be used in the majority of the use-cases when Rust developers have to match more than just a couple of patterns.
And the actual question is "why match's code is slower than if-let's" (it could be related to branch misprediction I guess). If this question is answered and probably addressed (even by changing the Rust internals), the whole community benefits of it. That's the ultimate goal of the benchmarks - improve the existing systems. However, if you're looking into how to squeeze the best performance by utilizing various and potentially unsafe features, then, unfortunately, this project is not about that. Please note though that it's possible to submit another implementation in addition to the reference one that could work faster. So, if you're willing, you're welcome to submit a faster code, but please note that it would be an additional test.
I understand the goal. Unfortunately current implementation of likely
works with expressions only (see https://github.com/rust-lang/rust/issues/26179 ), so it couldn't be applied to the match branches. So the only way to get correct branches compilation is manually write the flow via if let
to force specific expansion order.
I understand that this change may go against the spirit of the project, this is why I created an issue instead of PR.
Please note though that it's possible to submit another implementation in addition to the reference one that could work faster. So, if you're willing, you're welcome to submit a faster code, but please note that it would be an additional test.
That's fine. Although I don't think there is any reason to place Better Rust
because results table will soon become a mess of "Cpp 534 vs Rust Alaha version" with thousands of alternatives. Keeping one most friendly and idiomatic is the best.
The only thing I can suggest is placeing #[inline(always)]
to be on par with C++'s inline
s in cpp bench. It saves some ticks on my machine. And it's a completely valid change which I see in lots of crates.
Thank you for the detail explanation. Please let me note though:
likely
looks like a micro-optimization, and C++ version doesn't use it. If C++ code has better predictions rate, than it seems like Rust has a little problem with the optimization of match
expression (assuming that if let
are predicted better).inline
keyword in C++, I guess I would rather delete inline
from there and would allow the compiler decide. And again, if Rust doesn't inline those simple functions, it's just a problem with heuristics and I hope they'll fix it eventually.likely looks like a micro-optimization, and C++ version doesn't use it. If C++ code has better predictions rate, than it seems like Rust has a little problem with the optimization of match expression (assuming that if let are predicted better).
There is no way CPP could predict which branch will happen more often. But JIT languages can. This is why Kotlin is this good at brainfuck2 benchmarks. After it hits enough times it recompiles the functon with statistics available. And in this statistics it sees that two branches are almost never hit so it generates much better code than any native language (that don't have runtime profiling information).
So there is nothing to "fix eventually" because there is no way to detemine it without profile-guided optimization which could be considered as an optimisation effort (thus not naive idiomatic solution) itself. I'm not arguing, I'm just explaining what I see here. Btw I wonder why Cpp even compiles differently. This is another study case for me.
I guess I would rather delete inline from there
Sounds good to me :)
Btw I wonder why Cpp even compiles differently. This is another study case for me.
This would interest me as well - come back once you found something, thanks.
If JIT were so good, Kotlin would work excellent on the bigger BF script too, but it doesn't, so I guess that it's not the whole story. On the other hand, considering that Kotlin removes some runtime checks (OpenJDK gives warning "OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release."), and as those checks could screw up the branch predictions, it explains why it performs better than other JVM-based tests.
I think it's still worth to have the optimized Rust version with likely
and inline
- it may help to determine and eliminate the weak points. If natively compiled optimized version performs worse than JIT-ed program, then it's definitely an issue that would be nice to address. It could be some compiler issues, like noticed in https://parallel-rust-cpp.github.io/v6.html - Clang and Rustc version should be identical, but they're not. My other suggestion would be tweaking the compiler flags (like using target-cpu
as JIT could do the same, generating code specific to the current CPU). However, it's definitely up to you.
I guess I would rather delete inline from there
Sounds good to me :)
Addressed in #209 . Please note that removing inline
actually improved performance (I double checked), looks like compiler could do better job than human here. However, it's GCC and on my server, it could be different for you.
I compared Rust and Cpp versions and I found out that most of performance difference is branch misprediction. It also drastically changes from CPU to CPU becuase it relies on CPU internals.
I found that changing
match *op
onif let
chain fixes this problem. Although I'm notsure if it follows your "Used data structures are idiomatic.". I mean it's more idiomatic to usematch
in these cases but what makes Rust itself is you could rewrite your code a bit to get better performance.What do you think? On my machine I've got 1.45 rust vs 1.388 cpp