ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Question: Why is this not part of GCC/Clang/... #32

Closed dumblob closed 7 years ago

dumblob commented 7 years ago

This is a ridiculously dumb question, but why are the optimizations employed in libdivide not used by GCC/Clang/... by default (e.g. for -O2)?

An additional question: Did you try to ask GCC/Clang/... upstreams to incorporate the optimizations you're using? If yes, what were the reactions and what are the outcomes?

ridiculousfish commented 7 years ago

It's not at all a dumb question!!

To be sure, gcc and clang both do perform this optimization for compile-time constants. What they don't do is perform the optimization for runtime constants.

To do it for runtime constants would require identifying loops that include division or modulus, with sufficiently high trip counts to make the optimization worthwhile. The next part is emitting code to compute the magic number; in some cases this can require e.g. 128 bit division, which may mean a library function. So it's doable, but might be ugly.

The big reason why we should do it anyways is vectorization. Division defeats vectorization, because vector hardware can't divide. libdivide eliminates division, so it can enable vectorizing loops that are otherwise resistant to vectorization.

I may take a crack at it myself one of these days.

ridiculousfish commented 7 years ago

Closing, thanks for the interesting question! Feel free to continue the conversation here.

dumblob commented 7 years ago

Thank you @ridiculousfish for explanation. The vectorization of loops is of a particular interest.

I'm curious what will you find out when it comes to "inclusion" into compilers like GCC/Clang.

dumblob commented 1 year ago

@ValZapod that is an interesting observation - about a year ago I made some research on the state-of-the-art floating point computations with focus on decimals (i.e. not exactly what GMP is for but still highly relevant IMHO). And GMP came out not that well as 15 years ago - especially regarding "precision vs. computation time".

The paper seems to gather some dust already - it is not on par with the methods used in projects linked in the tree of links rooted in the quick research I linked above.

Also on M1/M2 and Intel after Ice Lake may be using libdivide in HW already. :)

This sounds very interesting - any pointers how to implement this in HW (ASIC or FPGA)?