ridiculousfish / libdivide

Official git repository for libdivide: optimized integer division
http://libdivide.com
Other
1.09k stars 77 forks source link

Use __umulh(), __mulh() MSVC intrinsics on WIN64 #22

Closed kimwalisch closed 8 years ago

kimwalisch commented 8 years ago

This pull request implements the enhancement suggested in https://github.com/ridiculousfish/libdivide/issues/19. As you can see below I measured a small speed up using your test program:

kim@kim-PC /cygdrive/C/Users/kim/Desktop/libdivide-MSVC-intrinsics
$ time ./libdivide_test.exe
Starting int32_t
Starting uint32_t
Starting sint64_t
Starting uint64_t

real    1m44.433s
user    0m0.000s
sys     0m0.030s

kim@kim-PC /cygdrive/C/Users/kim/Desktop/libdivide-master
$ time ./libdivide_test.exe
Starting int32_t
Starting sint64_t
Starting uint32_t
Starting uint64_t

real    1m51.549s
user    0m0.000s
sys     0m0.000s
kimwalisch commented 8 years ago

I want to release a new version of primecount this weekend and I want to start using libdivide on Windows/MSVC as well. So I will merge this pull request now.

Here are some benchmarks for libdivide with __mulh(), __umulh() intrinsics for MSVC 2015 x64:

# primecount without libdivide, slowest
> primecount.exe 1e16 --S2_easy -s

=== S2_easy(x, y) ===
Computation of the easy special leaves
x = 10000000000000000
y = 4117019
z = 2428941911
c = 6
alpha = 19.110
threads = 4

Status: 100%
S2_easy = 63933848726803
Seconds: 4.635
# primecount with libdivide, but without __mulh(), __umulh()
> primecount.exe 1e16 --S2_easy -s

=== S2_easy(x, y) ===
Computation of the easy special leaves
x = 10000000000000000
y = 4117019
z = 2428941911
c = 6
alpha = 19.110
threads = 4

Status: 100%
S2_easy = 63933848726803
Seconds: 3.905
# primecount with libdivide and with __mulh(), __umulh(), fastest
> primecount.exe 1e16 --S2_easy -s

=== S2_easy(x, y) ===
Computation of the easy special leaves
x = 10000000000000000
y = 4117019
z = 2428941911
c = 6
alpha = 19.110
threads = 4

Status: 100%
S2_easy = 63933848726803
Seconds: 3.175