For VC compiler and 32bit platform, function libdivide__count_leading_zeros64 changed to use _BitScanReverse intinsic function instead of while loop. Tested only on Windows CE 6.0 ARM Cortex A8 and libdivide::divider perfoms about 10-20% faster than the original implementation. The _BitScanReverse is not implemented under CE VC2008 and was implemented by me in separate file as 31-_CountLeadingZeros.
For VC compiler and 32bit platform, function libdivide__count_leading_zeros64 changed to use _BitScanReverse intinsic function instead of while loop. Tested only on Windows CE 6.0 ARM Cortex A8 and libdivide::divider perfoms about 10-20% faster than the original implementation. The _BitScanReverse is not implemented under CE VC2008 and was implemented by me in separate file as 31-_CountLeadingZeros.