ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Status update on +-1 branchfree dividers #55

Closed kimwalisch closed 4 years ago

kimwalisch commented 5 years ago

Currently the branchfree divider cannot be +1 and -1 (our current implementation calls exit(-1) if the user tries to generate a branchfree divider for +-1). When the branchfree divider was first implemented 3 years ago we did not find a way to support +-1 dividers and our branchfree algorithm currently returns 0 if the divider is 1.

But the recent pull request https://github.com/ridiculousfish/libdivide/pull/49 from @yb303 contains proof of concept code that shows that it is possible to support +-1 branchfree dividers. The basic idea is to add the numerator to the result if the divider is 1 (in this case the result is 0).

In order to efficiently and cleanly implement +-1 branchfree support the best solution seems to be to add a new int8_t one bitmask to the branchfree divider structs that is set to -1 if the divider is 1 and to 0 otherwise. Using this new one bitmask it is possible to conditionally add the numerator to the result using just 3 additional assembly instructions on x86.

// Does not support 1 divider
uint64_t libdivide_u64_branchfree_do(uint64_t numer, 
                                     const struct libdivide_u64_branchfree_t *denom) {
    uint64_t q = libdivide_mullhi_u64(denom->magic, numer);
    uint64_t t = ((numer - q) >> 1) + q;
    return t >> denom->more;
}

// Supports 1 divider
uint64_t libdivide_u64_branchfree_do(uint64_t numer, 
                                     const struct libdivide_u64_branchfree_t *denom) {
    uint64_t q = libdivide_mullhi_u64(denom->magic, numer);
    uint64_t t = ((numer - q) >> 1) + q + (numer & denom->one);
    return t >> denom->more;
}

If the algorithm is implemented as shown above there is a performance slowdown of about 20%. If the algorithm is implemented without adding a new variable to the branchfree structs there is a performance slowdown of 35 to 40% (and the implementation must use hacks because there are no bits left in the more variable). Adding a new field to the branchfree structs is not ideal from a clean code perspective since we cast branchfree dividers to ordinary dividers in libdivide.h.

Personally I am using the u64 branchfree divider in two of my projects and I don't need +-1 support. Hence for my personal use case not supporting +-1 is actually a feature since my code will run faster without it. Hence for the time being I will not implement +-1 branchfree divider support.

If you would like to use the branchfree divider in your project and you need +-1 support, please let us know and leave a comment below. You can also leave a comment if you don't care about +-1 branchfree divider support and prefer better performance.

kimwalisch commented 5 years ago

Actually our signed branchfree algorithm already supported +1 and -1 dividers, we were just unaware of it. I have now removed the checks that disallow using +1 and -1 as signed branchfree dividers in libdivide.h and enabled +1 and -1 testing for signed branchfree dividers.