ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Support overloaded types in C++ templates #57

Closed chris-se closed 5 years ago

chris-se commented 5 years ago

On 32bit systems both int and long are typically both 32bit, while long long is 64bit, but both are distinct types. On 64bit systems the same is true for long and long long (both 64bit). But as they are distinct types, only one of them is aliased to the corresponding sized type (int32_t/int64_t). This means that for example on 64bit Linux systems, where long is typically aliased to int64_t, it is not possible to create a libdivide::divider<long long>. Conversely, on Apple macOS int64_t is typically long long, so it's not possible to create a libdivide::divider<long> there - even though the types are compatible with each other.

This commit adds a bit of template magic that tries to detect the proper sized type for any given type libdivide is used with. This allows libdivide to support divider<int>, divider<long> and divider<long long> on all architectures where these map to either 32bit or 64bit types.

The code should be compatible with C++98.

(edit: ensured that GitHub doesn't interpret the < in the pull request message.)

kimwalisch commented 5 years ago

This means that for example on 64bit Linux systems, where long is typically aliased to int64_t, it is not possible to create a libdivide::divider.

Could you please post a little test program here that causes a compiler error without your patch but works fine with your patch?

chris-se commented 5 years ago

Sure:

#include "libdivide.h"

#include <iostream>
#include <cstdlib>

template<typename T>
void do_divide(T a, T b)
{
    libdivide::divider<T> divider(b);
    std::cout << a << " / " << b << " = " << (a / divider) << std::endl;
}

int main(int argc, char** argv)
{
    if (argc != 3) {
        std::cerr << "Usage: " << argv[0] << " num denom" << std::endl;
        return 1;
    }

    long long num = strtoll(argv[1], 0, 10);
    long long denom = strtoll(argv[2], 0, 10);

    do_divide((long long) num, (long long) denom);
    do_divide((long) num, (long) denom);
    do_divide((int) num, (int) denom);
    do_divide((int32_t) num, (int32_t) denom);
    do_divide((int64_t) num, (int64_t) denom);

    if (num > 0 && denom > 0) {
        do_divide((unsigned long long) num, (unsigned long long) denom);
        do_divide((unsigned long) num, (unsigned long) denom);
        do_divide((unsigned int) num, (unsigned int) denom);
        do_divide((uint32_t) num, (uint32_t) denom);
        do_divide((uint64_t) num, (uint64_t) denom);
    }
    return 0;
}

Current master fails with:

In file included from test.cpp:1:0:
libdivide.h: In instantiation of ‘libdivide::divider<T, ALGO>::divider(T) [with T = long long int; int ALGO = 0]’:
test.cpp:10:36:   required from ‘void do_divide(T, T) [with T = long long int]’
test.cpp:25:49:   required from here
libdivide.h:1999:25: error: no matching function for call to ‘libdivide::dispatcher<long long int, 0>::dispatcher(long long int&)’
     divider(T d) : div(d) { }
                         ^
libdivide.h:1976:39: note: candidate: constexpr libdivide::dispatcher<long long int, 0>::dispatcher()
 template<typename T, int ALGO> struct dispatcher { };
                                       ^~~~~~~~~~
libdivide.h:1976:39: note:   candidate expects 0 arguments, 1 provided
libdivide.h:1976:39: note: candidate: constexpr libdivide::dispatcher<long long int, 0>::dispatcher(const libdivide::dispatcher<long long int, 0>&)
libdivide.h:1976:39: note:   no known conversion for argument 1 from ‘long long int’ to ‘const libdivide::dispatcher<long long int, 0>&’
libdivide.h:1976:39: note: candidate: constexpr libdivide::dispatcher<long long int, 0>::dispatcher(libdivide::dispatcher<long long int, 0>&&)
libdivide.h:1976:39: note:   no known conversion for argument 1 from ‘long long int’ to ‘libdivide::dispatcher<long long int, 0>&&’
libdivide.h: In instantiation of ‘libdivide::divider<T, ALGO>::divider(T) [with T = long long unsigned int; int ALGO = 0]’:
test.cpp:10:36:   required from ‘void do_divide(T, T) [with T = long long unsigned int]’
test.cpp:32:71:   required from here

(On macOS there's a similar error message, but for long instead of long long, because int64_t is long long there.)

My patched version:

g++ -std=c++98 -Wall -Wextra -pedantic -Wno-long-long -o test test.cpp && ./test 20 4
20 / 4 = 5 [10 times]
g++ -std=c++11 -Wall -Wextra -pedantic -o test test.cpp && ./test 20 4
20 / 4 = 5 [10 times]
g++ -std=c++14 -Wall -Wextra -pedantic -o test test.cpp && ./test 20 4
20 / 4 = 5 [10 times]
g++ -std=c++17 -Wall -Wextra -pedantic -o test test.cpp && ./test 20 4
20 / 4 = 5 [10 times]
g++ --version
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

Same behavior also with upstream g++ 7.2.0, as well as clang from XCode 10.3.

kimwalisch commented 5 years ago

Thanks for your bug report.

However your code is not 100% correct (it won't work on CPUs where bytes have more than 8 bits), writing 100% correct meta-programming code is tricky especially when you are not using the C++11 <type_traits> header. I'd like to fix this this bug but ideally using simpler code...

chris-se commented 5 years ago

What about the following (untested, just as an illustration)?

template<typename T> struct possible_matched_type                     { enum { value = 0 }; };
template<>           struct possible_matched_type<int>                { enum { value = 1 }; };
template<>           struct possible_matched_type<long>               { enum { value = 1 }; };
template<>           struct possible_matched_type<long long>          { enum { value = 1 }; };
template<>           struct possible_matched_type<unsigned int>       { enum { value = 2 }; };
template<>           struct possible_matched_type<unsigned long>      { enum { value = 2 }; };
template<>           struct possible_matched_type<unsigned long long> { enum { value = 2 }; };
template<typename T, int MATCH = possible_matched_type<T>::value, int BITS = sizeof(T) * CHAR_BIT> struct mapped_type { typedef T type; };
template<typename T> struct mapped_type<T, 1, 32> { typedef  int32_t type; };
template<typename T> struct mapped_type<T, 2, 32> { typedef uint32_t type; };
template<typename T> struct mapped_type<T, 1, 64> { typedef  int64_t type; };
template<typename T> struct mapped_type<T, 2, 64> { typedef uint64_t type; };

// later, in the class private:
typedef typename mapped_type<T>::type ST;

This would have the following behavior:

Without C++11 and <type_traits> I don't think this code could be made simpler though.

If this change would be acceptable I could test this to see if it actually works, and update the pull request.

chris-se commented 5 years ago

I've now implemented the version I described after receiving your feedback (changing the name of the first template) and tested it. Hopefully this is acceptable to merge.

kimwalisch commented 5 years ago

Thanks for your effort!

I will test your code when I am back from holiday in two weeks (I am travelling without a notebook) and it will eventually be merged if it works well. In the mean time another user has already posted a bug report that your pull request will hopefully fix. Also I would like to play around with your code and see if I can further shorten and/or simplify it (One idea I have is to avoid hardcoding constants as template parameters and instead use sizeof whereever I can).

kimwalisch commented 5 years ago

@chris-se: Do you need C++98 support? With C++11 we can have a nice a simple solution which adds just 1 line of code e.g.:

// The dispatcher selects a specific division algorithm for a given
// type and ALGO using partial template specialization.
template<bool IS_INTEGRAL, bool IS_SIGNED, int SIZEOF, int ALGO> struct dispatcher { };

template<> struct dispatcher<true, true, sizeof(int32_t), BRANCHFULL> { DISPATCHER_GEN(int32_t, s32) };
template<> struct dispatcher<true, true, sizeof(int32_t), BRANCHFREE> { DISPATCHER_GEN(int32_t, s32_branchfree) };
template<> struct dispatcher<true, false, sizeof(uint32_t), BRANCHFULL> { DISPATCHER_GEN(uint32_t, u32) };
template<> struct dispatcher<true, false, sizeof(uint32_t), BRANCHFREE> { DISPATCHER_GEN(uint32_t, u32_branchfree) };
template<> struct dispatcher<true, true, sizeof(int64_t), BRANCHFULL> { DISPATCHER_GEN(int64_t, s64) };
template<> struct dispatcher<true, true, sizeof(int64_t), BRANCHFREE> { DISPATCHER_GEN(int64_t, s64_branchfree) };
template<> struct dispatcher<true, false, sizeof(uint64_t), BRANCHFULL> { DISPATCHER_GEN(uint64_t, u64) };
template<> struct dispatcher<true, false, sizeof(uint64_t), BRANCHFREE> { DISPATCHER_GEN(uint64_t, u64_branchfree) };

// ...

// Storage for the actual divisor
dispatcher<std::is_integral<T>::value, 
           std::is_signed<T>::value, sizeof(T), ALGO> div;
kimwalisch commented 5 years ago

Fixed by https://github.com/ridiculousfish/libdivide/commit/623f9be3e406712be192e75833019f790aef8a51

Thanks for your feedback!

chris-se commented 5 years ago

Sorry, was busy the last couple of days. Thanks for taking care of this!

I personally don't need C++98 support (I only need C++17), but from what I understood libdivide supports C++98, that's why I wrote the patch in the way I did. Personally I'm perfectly fine with a C++11 requirement.

I tested your change against my test script and it works great, many thanks for your efforts!