Open abonander opened 8 years ago
This crate tries to only provide what the CPU offers, and most CPUs don't offer .powi
and .cos
.
However, abs
almost exists, in the form of a bitwise &
with (!0) >> 1
(i.e. clear the high-bit), and many have FMA (fused-multiply-add, which I guess is what you mean by multiply/accumulate) although I believe they're only in fairly recent x86-64 CPUs.
My recommended approach for these is to build on top of simd
in an external crate, using the CPU's instructions when they're available.
Thoughts?
Reiterating from our conversation in IRC for the record.
.abs()
is really the only method I need for my project. However, I didn't know it was that simple to implement, so I can hand-roll that if necessary. It'd probably be pretty useful in the lib though.
My CPU doesn't even support FMA instructions (Ivy Bridge/3770k), so that's a non-starter, though it might be a good feature to have for Haswell and up.
Since I do have AVX, I might want to expand to f64x4
but I don't see any impls for it. How much work would that be?
@huonw My laptop's CPU does support FMA (Haswell/i3-4030U) so I would like to try implementing that. How might I go about it?
The first step is making sure the compiler has support for the intrinsics https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=FMA which is done by adding the definitions to https://github.com/rust-lang/rust/tree/master/src/etc/platform-intrinsics (e.g. create x86/fma3.json
like say avx.json
) and then regenerating the compiler definitions like
python generator.py --format compiler-defs --info x86/info.json x86/{sse,sse2,sse3,ssse3,sse41,sse42,avx,avx2,fma3}.json -o ../../librustc_platform_intrinsics/x86.rs
Then you can create the required extern block with
python generator.py --format extern-block --info x86/info.json x86/fma3.json
Place this into src/x86/fma3.rs
and add traits similar to whats in sse3.rs or avx.rs.
Since I do have AVX, I might want to expand to f64x4 but I don't see any impls for it. How much work would that be?
There are some impls for this, you just have to import them specifically, e.g. use simd::x86::avx::*;
will bring in f64x4
etc.
For a project I'm working on (DCT computation) I would really like to have
.powi()
and.abs()
functions forf64x2
..cos()
would be nice as well but I can live with a Taylor series approximation if it doesn't benefit from SIMD enough to make it worth the effort.I'd love to add these myself, the design of the library seems to be pretty straightforward, but I have absolutely no idea where to find what intrinsic functions to use, or how to figure out what their names would be.
Addendum: my code might really benefit from a multiply/accumulate function as well, if that's not too hard to get on
f64x2
.