ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

primecount fails its test suite (MSVC, x86) #24

Closed kimwalisch closed 8 years ago

kimwalisch commented 8 years ago

primecount with libdivide (branchfree) fails its test suite when compiled using MSVC x86 (tested with MSVC 2010, MSVC 2015).

The test suite seems to get stuck in an infinite loop:

> primecount.exe --test
Testing phi(x, a) 100%
Testing pi_legendre(x) 100%
Testing pi_meissel(x) 100%
Testing pi_lehmer(x) 100%
Testing pi_lmo1(x) 100%
Testing pi_lmo2(x) 100%
Testing pi_lmo3(x) 100%
Testing pi_lmo4(x) 100%
Testing pi_lmo5(x) 100%
Testing pi_lmo_parallel1(x) 100%
Testing pi_lmo_parallel2(x) 100%
Testing pi_lmo_parallel3(x) 100%
Testing pi_deleglise_rivat1(x)^C

Observations:

Conclusion:

There is a bug in the u64 branchfree algorithm when compiling using MSVC 32-bit (x86).

kimwalisch commented 8 years ago

primecount gets stuck in S2_easy(2) where libdivide is used:

=== S2_easy(x, y) ===
Computation of the easy special leaves
x = 2
y = 1
z = 2
c = 0
alpha = 1.000
threads = 1

debug: libdivide generate branchfree divisors
debug: for the following values: 0,
debug: inside libdivide_u64_branchfree_gen()
debug: inside libdivide_internal_u64_gen()
debug: execute libdivide__count_leading_zeros64(d
debug: inside libdivide__count_leading_zeros64(v)
debug: Dorky way to count leading zeros

# stuck in an infinite loop

The erroneous code is:

// Dorky way to count leading zeros.
// Note that this hangs for val = 0!
int32_t result = 0;
while (! (val & (1ULL << 63))) {
        val <<= 1;
        result++;
}

The comment even mentions that the code will hang for 0! Up until now I relied on libdivide to generate an undefined divisor for the value 0, but not to hang! This works fine for all compilers and CPU architectures I have tested so far except for MSVC x86!

Personally I would prefer if this issue was fixed in libdivide and not in primecount. I'll see if I can find a good fix.

kimwalisch commented 8 years ago

I found another piece of code which has the same issue:

// Dorky way to count trailing zeros. Note that this hangs for val = 0!
int32_t result = 0;
// Set v's trailing 0s to 1s and zero rest
val = (val ^ (val - 1)) >> 1;
while (val) {
    val >>= 1;
    result++;
}

Both algorithms could be replaced by algorithms using the De Bruijn bit scan trick: https://chessprogramming.wikispaces.com/BitScan. This would however require that we introduce a small static lookup table of 64 items for each algorithm.

Any comments? Fish, did you try to avoid any static lookup tables on purpose in libdivide?

ridiculousfish commented 8 years ago

No big static lookup tables. Why is the LIBDIVIDE_VC path not being hit?

Hanging on /0 seems almost like a feature to me. I don't think we want to guarantee any particular behavior here.

kimwalisch commented 8 years ago

Why is the LIBDIVIDE_VC path not being hit?

Because it is only being used for x64:

#elif LIBDIVIDE_VC && _WIN64
    unsigned long result;
    if (_BitScanForward64(&result, val)) {
            return result;
    }
    return 0;

We could fix the issue by using a similar algorithm than the one used in libdivide__count_trailing_zeros64():

// Count leading zeros on 32-bit systems
uint32_t hi = val >> 32;
uint32_t lo = val & 0xFFFFFFFF;
if (hi != 0) return libdivide__count_leading_zeros32(hi);
return 32 + libdivide__count_leading_zeros32(lo);

This fixes my primecount bug because libdivide__count_leading_zeros32() uses the _BitScanReverse() intrinsic using MSVC x86 and also improves libdivide i.e. makes it faster and gets rid of one of the 2 dorky algorithms...

Your libdivide_test.exe program passes all tests using the new code and is even slightly faster (as expected):

# Old libdivde with dorky algorithm and MSVC 2010 x86
$ time ./libdivide_test.exe
Starting int32_t
Starting uint32_t
Starting sint64_t
Starting uint64_t

real    6m46.331s
user    0m0.000s
sys     0m0.015s
# New libdivde with _BitScanReverse() MSVC 2010 x86
$ time ./libdivide_test.exe
Starting int32_t
Starting uint32_t
Starting sint64_t
Starting uint64_t

real    6m44.298s
user    0m0.015s
sys     0m0.000s
kimwalisch commented 8 years ago

The other dorky algorithm that is used for counting the number of leading zeros on 32-bit systems:

// Dorky way to count leading zeros.
// Note that this hangs for val = 0!
int32_t result = 0;
while (! (val & (1U << 31))) {
    val <<= 1;
    result++;
}
return result;  

Could be modified like below in order not to hang on 0:

int32_t result = 0;
uint32_t hi = 1U << 31;

while (~val & hi) {
    hi >>= 1;
    result++;
}
return result;    

I think this is a good solution because this way libdivide would be consistent. Currently libdivide does not hang on 0 for all compilers except MSVC x86.

kimwalisch commented 8 years ago

I have now created a pull request https://github.com/ridiculousfish/libdivide/pull/25 to fix this issue using the algorithms described in this thread.

kimwalisch commented 8 years ago

Fix has been merged.