huonw / simd

[DEPRECATED] see:
http://github.com/rust-lang-nursery/simd
Apache License 2.0
82 stars 18 forks source link

How might one help add more functions? #13

Open abonander opened 8 years ago

abonander commented 8 years ago

For a project I'm working on (DCT computation) I would really like to have .powi() and .abs() functions for f64x2. .cos() would be nice as well but I can live with a Taylor series approximation if it doesn't benefit from SIMD enough to make it worth the effort.

I'd love to add these myself, the design of the library seems to be pretty straightforward, but I have absolutely no idea where to find what intrinsic functions to use, or how to figure out what their names would be.

Addendum: my code might really benefit from a multiply/accumulate function as well, if that's not too hard to get on f64x2.

huonw commented 8 years ago

This crate tries to only provide what the CPU offers, and most CPUs don't offer .powi and .cos.

However, abs almost exists, in the form of a bitwise & with (!0) >> 1 (i.e. clear the high-bit), and many have FMA (fused-multiply-add, which I guess is what you mean by multiply/accumulate) although I believe they're only in fairly recent x86-64 CPUs.

My recommended approach for these is to build on top of simd in an external crate, using the CPU's instructions when they're available.

Thoughts?

abonander commented 8 years ago

Reiterating from our conversation in IRC for the record.

.abs() is really the only method I need for my project. However, I didn't know it was that simple to implement, so I can hand-roll that if necessary. It'd probably be pretty useful in the lib though.

My CPU doesn't even support FMA instructions (Ivy Bridge/3770k), so that's a non-starter, though it might be a good feature to have for Haswell and up.

Since I do have AVX, I might want to expand to f64x4 but I don't see any impls for it. How much work would that be?

abonander commented 8 years ago

@huonw My laptop's CPU does support FMA (Haswell/i3-4030U) so I would like to try implementing that. How might I go about it?

huonw commented 8 years ago

The first step is making sure the compiler has support for the intrinsics https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=FMA which is done by adding the definitions to https://github.com/rust-lang/rust/tree/master/src/etc/platform-intrinsics (e.g. create x86/fma3.json like say avx.json) and then regenerating the compiler definitions like

python generator.py --format compiler-defs --info x86/info.json x86/{sse,sse2,sse3,ssse3,sse41,sse42,avx,avx2,fma3}.json -o ../../librustc_platform_intrinsics/x86.rs 

Then you can create the required extern block with

python generator.py --format extern-block --info x86/info.json x86/fma3.json

Place this into src/x86/fma3.rs and add traits similar to whats in sse3.rs or avx.rs.

huonw commented 8 years ago

Since I do have AVX, I might want to expand to f64x4 but I don't see any impls for it. How much work would that be?

There are some impls for this, you just have to import them specifically, e.g. use simd::x86::avx::*; will bring in f64x4 etc.