Closed zekyll closed 8 years ago
Cool! Did you happen to check how MSVC is currently compiling these - i.e. are the intrinsics actually faster?
It does compile to single instruction on x64 at least and it is significantly faster.
Oh I just noticed what the current implementation is. Yes, let's switch to these where available.
The functions libdividemullhi_u64/libdivide__mullhi_s64 can be implemented much faster on MSVC with umulh()/__mulh() intrinsics. They have the exact same signature so it's a very easy modification and should be supported on all MSVC versions.