rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
882 stars 80 forks source link

impl `trig` for `core::simd` #6

Open Lokathor opened 4 years ago

Lokathor commented 4 years ago
etemesi-ke commented 4 years ago

Heey, i could implement some of these and submit a PR, and I was wondering If the appropriate way was to use the Taylor series for the above trig functions since am not sure what the rust standard library uses

calebzulawski commented 4 years ago

The Taylor series is probably not sufficient since it's only accurate near 0. This paper summarizes a variety of algorithms used for vectorized implementations of popular numeric functions, including trigonometry: https://arxiv.org/abs/2001.09258. The referenced SLEEF library is used by packed_simd, however I don't believe SLEEF itself can be used in this crate.

etemesi-ke commented 4 years ago

What algorithm did you have in mind then?

And would the impls look like

trait trig {
     type Output=Self::Output
     #[target-enable="sse"]
     fn cos(&self);
}
impl trig for _mm128{
     type Output = _mm128
    #[target-enable="sse"]

    fn cos(){
     //Code
    }
}
etemesi-ke commented 4 years ago

Sorry for the first part, after re-reading your comment if I understand it right you want to use the algorithms in the paper used by sleef but not necessarily the sleef library. I.e rewrite them in rust?

Lokathor commented 4 years ago

So, I don't know what the quality or speed is, but LLVM has built-in handling for vector trig same as with vector add, sub, etc.

Example from packed_simd: https://github.com/rust-lang/packed_simd/blob/master/src/codegen/math/float/cos.rs

If the LLVM versions are of sufficient quality, we should likely just use that.

bjorn3 commented 4 years ago

There are platform-intrinsic versions for sin and cos by the way, so no need to use LLVM intrinsics directly. Curiously enough there are no versions for other trigonometry functions, but that should be easy to add.

https://github.com/rust-lang/rust/blob/41507ed0d57eba71adc20a021a19b64322162f04/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1080-L1085

etemesi-ke commented 4 years ago

Are the LLVM intrinsics useful for SIMD with what I understand according to their doc their cos function for example just operates on one floating point value

bjorn3 commented 4 years ago

There are LLVM intrinsics with names like llvm.cos.v2f32 that operate on vectors.

etemesi-ke commented 4 years ago

@bjorn3 Thanks for clarifying that,

so the code might look like

![feature(link_llvm_intrinsics)]
extern {
    #[link_name="llvm.cos.v2f32"]
      fn cos(a:[f64;2])->[f64;2];
}
/// A vector for two floating points
struct f64x2([f64; 2]);

impl trig for f64x2{
   fn cos(&self)->f64x2{
        f64x2(cos(self.0))
   }
}
bjorn3 commented 4 years ago

More like

#![feature(link_llvm_intrinsics)]
extern {
    #[link_name="llvm.cos.v2f32"]
      fn cos(a: f64x2)-> f64x2;
}
/// A vector for two floating points
#[repr(simd)]
struct f64x2([f64; 2]);

impl trig for f64x2{
   fn cos(&self)-> f64x2{
        cos(self)
   }
}
etemesi-ke commented 4 years ago

Okay that makes sin and cos easy

And the rest are we gonna implement from the ground up?

emmanuelantony2000 commented 4 years ago

I think so yes, tan and others have to be implemented from the ground up. Is it implemented by using sin and cos, or is there an another way?

programmerjake commented 4 years ago

Note that llvm.cos.v2f32 means f32x2, if you want f64x2 you need to use llvm.cos.v2f64

Lokathor commented 4 years ago

tan = sin / cos, so that one specifically is easy. also you can write sin_cos and merge 90% of the work.

We probably want to port a lot of the Agner Fog vector math stuff, https://github.com/vcoda/vectorclass (edit: wrong link, this is an older gpl3 version, see below for apache2)

Some of which is already ported in the wide crate.

calebzulawski commented 4 years ago

I think we need to be careful about "porting" libraries, both SLEEF and that library are incompatible with MIT/Apache. Additionally I think we need to be careful about our implementations, since naive approaches will likely have poor accuracy. I think our best bet is to find branchless algorithms (could be scalar) that are available under compatible licenses, and port them.

Lokathor commented 4 years ago

oh my oh my, that's the wrong link! google search let me down.

Version 2 of the lib is here https://github.com/vectorclass/version2, and is apache-2.0 licensed, so we could write the rust version and have it be apache/mit

calebzulawski commented 4 years ago

Oh awesome, much better. I'm definitely okay with using this implementation.

workingjubilee commented 3 years ago

Ah, this was the issue I kept forgetting. This issue, roughly, makes a lot of this effort more complex: https://github.com/rust-lang/rust/issues/50145

fu5ha commented 3 years ago

myself, @Lokathor and one of the other wide contributor people already ported several of these from the agner fog lib linked aboe to rust for use in wide, it would probably not be very hard to port those rust impls to stdsimd. I would be willing to do this, if nobody else is working on it currently.

Lokathor commented 3 years ago

A contributor would be welcomed.

The current status appears to be that most of the library can't be in core anyway for other reasons, so i think it's easiest to make this work at all, and then we can (maybe!) make it work in core later.

thomcc commented 3 years ago

I think that it's still an open question where to put it though, since part of the motivation behind supporting these is to use instrucitons on the platform where these have dedicated instructions. That probably means using simd_sin and stuff...

Lokathor commented 3 years ago

I propose:

stage 1: we just code it in the library

stage 2: the library transitions to intrinsics when llvm can't emit a direct call to a platform version it calls a vector version of libm our vector version of libm is the new home of the code from stage 1

workingjubilee commented 3 years ago

Please do! And that sounds like a plan, @Lokathor!

simd_sin and simd_cos on x86 seem to degenerate to scalar libm calls even with -Ctarget-feature=+avx2 enabled, unfortunately (but with AVX vmovss instructions, just to make sure you know it's in fact using AVX!), so we can't really depend on LLVM emitting useful code for those, even if LLVM nominally supports them.

programmerjake commented 3 years ago

WIP implementation: https://github.com/rust-lang/stdsimd/issues/109 Code: https://salsa.debian.org/Kazan-team/vector-math

andy-thomason commented 3 years ago

If you are interested.

I usually use a newton polynomial on the Chebyshev points pinned at the extrema to avoid any conditional code.

This way sin and cos, for example just become a single horner-form polynomial over the range [0..1) corresponding to 0..2pi.

You can use combinations of COORDIC transforms to lower the initial range and this is necessary with inverse functions of trig and exponentials, but for most cases there is no need and just adding a few more terms to the polynomial is sufficient to get accuracy over the range and exact values at the exterma.

You can choose your newton polynomial points at points of interest to get exact values where you like.

programmerjake commented 3 years ago

hmm, ok. for sin_cos_pi, I range reduced to sin([-pi/4,pi/4]) and cos([-pi/4,pi/4]) and used a taylor series up to x^16 for f64, x^9 for f32, and x^5 for f16. That gave me 2ULP accuracy (tested against f64::sin/cos for f16 and f32, tested against mpfr for f64). using more terms didn't give more accuracy due to round-off error.

andy-thomason commented 3 years ago

I'll try to get some time to do some manual runs for you with carefully chosen newton polynomials.

Sin and cos are very well behaved as their differentials are well behaved and the series converges rapidly. Even Taylor (McLaurin) series, which are about the worst case because they only approximate the zero point will converge.

Chebyshev approximation gives you a more consistent error profile, spreading the error out over the function whereas Remez can suffer from instability.

doctor_syn my fledgling computer algebra system based on syn currently generates code based on something close to Chebeyshev. It uses Newton polynomials to build a polynomial that is exact in certain places.

By spreading the places where it is exact out along the valid range, you can get a much lower maximum error for fewer terms.

To get a good f64 result, you need to work with algebra > f64 precision. This was traditionally f80 (legacy 8087) but there are a few extended precision Rust libs out there. Rug is a popular choice, but is a huge Frankenstein's monster wrapping GMP - another huge monster. You can also use BigRational as long as you keep approximating the numerator/denominator to avoid ourageous run times.

There may be a f128 crate out there somewhere, too.

https://en.wikipedia.org/wiki/Newton_polynomial https://en.wikipedia.org/wiki/Approximation_theory#Chebyshev_approximation

andy-thomason commented 3 years ago

Anyway, great work, @programmerjake I'm quite excited!

I'll try to get to the next SIMD group meeting on Monday. I was doing a talk last night with the Cardiff Rust group and could not attend.

programmerjake commented 3 years ago

If you don't care about speed, you can use the algebraics crate (implements real algebraic numbers), or the simple-soft-float crate which uses algebraics (implements ieee 754 fp math with support for huge fp types, e.g. f65536). I wrote both of them.

andy-thomason commented 3 years ago

Cool!

programmerjake commented 3 years ago

took me several months to figure out how to efficiently factor polynomials with bigint coefficients -- used to implement real algebraic numbers

programmerjake commented 3 years ago

modular fields of polynomials with modular integer coefficients are complicated!

andy-thomason commented 3 years ago

Tell Galois that :)

Good to see this work being done. I have slightly wider requirements with the full gamut of stats functions in R, but I worked with SCEE on the Playstation compilers for a long time. I used to use Maple to calculate aproximations and factorise polynomials, but I prefer to do it myself now.

programmerjake commented 3 years ago

for higher precision in core::simd, I'm planning on using double-double arithmetic, since f64 SIMD is commonly supported.