Closed quicknir closed 5 years ago
Hey @quicknir, thanks for bringing this up! As far as I know, Abseil variant
performs this optimization and it seems pretty clear that it does help. I haven't been able to make time to implement it myself, but I would love to see it implemented! We should be able to measure at least the resulting binary size within my variant
benchmark framework: https://mpark.github.io/variant
Are you considering implementing this?
Yes definitely. I'll try to look at abseil code in godbolt too, just to get a sense and make sure I end up with equally good assembly. Maybe it's easier if I start with trying to do a PR here, then looking at standard libraries?
@mpark Hey, I opened a PR a while ago (https://github.com/mpark/variant/pull/52), just pinging to draw attention.
Thanks for the ping! Looking...
Closed by #52 and #56
Hey, I found some pretty not great codegen for std::visit. I mentioned this to Louis Dionne (a libc++ maintainer), and asked him if there was a reason switch case was not used for the common case (e.g. visit one variant for under ten types). You can see the differences in assembly (std, boost, switch-case) for 2 examples here: https://gcc.godbolt.org/z/Kt8ZNf.
He suggested opening an issue here, because he said that you were the expert on implementing variant :-). Someone also told me that there was an open bug report on gcc, so I wrote things up fairly neatly there in more detail: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78113. The gist of it is that the constexpr table of function pointer approach just doesn't get inlined even with brand new compilers, and you can see that even relatively easy optimization like simply optimizing out an empty visitor can't be done. Switch case can't really be doen generically, but it seems easy to generate a bit of code such that you can handle visit for a single variant for up to N types (have N equal e.g. 10). This code is actually fairly simple, and gives massive improvements in assembly for the very common case, at no runtime cost to the other cases, and just very minimal compile cost (most likely).
Let me know your thoughts!