Closed dumblob closed 7 years ago
It's not at all a dumb question!!
To be sure, gcc and clang both do perform this optimization for compile-time constants. What they don't do is perform the optimization for runtime constants.
To do it for runtime constants would require identifying loops that include division or modulus, with sufficiently high trip counts to make the optimization worthwhile. The next part is emitting code to compute the magic number; in some cases this can require e.g. 128 bit division, which may mean a library function. So it's doable, but might be ugly.
The big reason why we should do it anyways is vectorization. Division defeats vectorization, because vector hardware can't divide. libdivide eliminates division, so it can enable vectorizing loops that are otherwise resistant to vectorization.
I may take a crack at it myself one of these days.
Closing, thanks for the interesting question! Feel free to continue the conversation here.
Thank you @ridiculousfish for explanation. The vectorization of loops is of a particular interest.
I'm curious what will you find out when it comes to "inclusion" into compilers like GCC/Clang.
@ValZapod that is an interesting observation - about a year ago I made some research on the state-of-the-art floating point computations with focus on decimals (i.e. not exactly what GMP is for but still highly relevant IMHO). And GMP came out not that well as 15 years ago - especially regarding "precision vs. computation time".
The paper seems to gather some dust already - it is not on par with the methods used in projects linked in the tree of links rooted in the quick research I linked above.
Also on M1/M2 and Intel after Ice Lake may be using libdivide in HW already. :)
This sounds very interesting - any pointers how to implement this in HW (ASIC or FPGA)?
This is a ridiculously dumb question, but why are the optimizations employed in libdivide not used by GCC/Clang/... by default (e.g. for
-O2
)?An additional question: Did you try to ask GCC/Clang/... upstreams to incorporate the optimizations you're using? If yes, what were the reactions and what are the outcomes?