ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Add constexpr structure generation (C++14 only) #118

Closed adbancroft closed 3 weeks ago

adbancroft commented 3 weeks ago

Support compile time calculation of the libdivide_[u16|s16|u32|s2|u64|s64](_branchfree)?_t structs. E.g.

constexpr auto constexprU64BF = libdivide::libdivide_u64_branchfree_gen_c(UINT64_MAX/123);

This is good for performance and memory footprint.

C++14 & up only

Implementation Notes

  1. I could not make the existing generation functions constexpr without compromising their performance. As a result, a new family of functions has been introduced: libdivide_[u16|s16|u32|s2|u64|s64](_branchfree)?_gen_c
  2. Since asm & some compiler intrinsics are not constexpr compatible, the long division and leading zero count (E.g. libdivide_64_div_32_to_32, libdivide_count_leading_zeros32) functions had to be split up to provide software only implementations callable at compile time (E.g. libdivide_64_div_32_to_32_software, libdivide_count_leading_zeros32_software) .
  3. Limiting copy/paste combined with constraints 1 & 2 required refactoring the structure generation code (E.g. libdivide_internal_u32_gen) to break out the constexprcompatible core algorithm (E.g. libdivide_internal_u32_gen_gen).
  4. C++ only: in order to make the constexpr structs as useful as possible, I added '_do' overloads taking a const reference. The existing 'do' code is shared between overloads via a 'do_raw' function (x2 for branch free versions)
  5. Manually maintaining all these function declarations is painful, so they are generated using macros (see DECLARE_TYPE). This also enforces consistency
adbancroft commented 3 weeks ago

Sorry but this is way too messy. libdivide.h is already difficult to maintain due to its length, and hiding declarations behind macros makes the problem significantly worse.

Yeah, this one snowballed a bit...

A constexpr initializer is a little silly, since if you know the divisor at compile time, you might as well just use ordinary division. When would this be useful?

As the readme notes, it's useful when the compiler doesn't optimize a compile time divisor. Latest AVR-GCC doesn't optimize int16_t, int32_t and uint32_t. See here - note the runtime calls to __divmod, __udivmod.

I'd be OK adding constexpr support if it can be done in a way that's not so invasive. I was able to make it work at least on clang with this experiment

I had something like this as a 1st pass. I'll close this PR & submit a replacement