rust-lang / stdarch

Rust's standard library vendor-specific APIs and run-time feature detection
https://doc.rust-lang.org/stable/core/arch/
Apache License 2.0
601 stars 267 forks source link

Check asm of PartialEq on ppc64le #459

Closed gnzlbg closed 6 years ago

gnzlbg commented 6 years ago

PPC64LE has a vcmpeq instruction that returns a vector of bools (like on x86 and arm), but also sets the CR[6] to true if all elements of the vector of bools are true, which allows implementing == using a single instruction since the vector reduction is not necessary.

The current implementation of PartialEq for the portable vector types does an a.eq(b).all() which might be hard for llvm to lower correctly to vcmpeq because llvm.vector.reduce.and is pretty new. It might be worth it to check if this is generated correctly, and if not, figure out what to do. We need to add a codegen/ module for partial_eq and call into that on ppc instead of doing the generic a.eq(b).all() dance.

nemanjai commented 6 years ago

I'm not sure I follow exactly what the relationship between PartialEq and a.eq(b).all() is, but it seems to me that what is needed is an ability to do the equivalent of (from altivec.h):

vec_all_eq() // All elements equal
vec_any_eq() // At least some elements are equal

The first is achieved with just the == operator and in PPC code as you mention will just use bit zero of CR6 set by the corresponding vcmpeq variant. However, the latter is just the equivalent of not all elements are not equal. In PPC code, that's essentially a crnot of bit 2 of CR6 set by the corresponding vcmpeq variant. In IR, that's implemented by the llvm.ppc.altivec.vcmpequ[bhw].p intrinsic with the first operand denoting the relationship that is sought.

gnzlbg commented 6 years ago

I'm not sure I follow exactly what the relationship between PartialEq and a.eq(b).all() is

PartialEq is currently implemented as a.eq(b).all(). That's it.

it seems to me that what is needed is an ability to do the equivalent of

Yep, this needs using a different PartialEq implementation on ppc architectures. I checked and LLVM does not properly lower a.eq(b).all() (see: https://bugs.llvm.org/show_bug.cgi?id=36702#c6).

The problem remains: until that LLVM bug is solved, there is no way for a.eq(b).all() to lower to vcmpeq.

nemanjai commented 6 years ago

We know that our handling of vector reductions is generally lacking as it all naively lowers each of the operations that the reductions break down into. We're working on improving this handling in the back end in general. In the meantime, there's always the option of using the intrinsics that we use for vec_all_eq and vec_any_eq.

gnzlbg commented 6 years ago

In the meantime, there's always the option of using the intrinsics that we use for vec_all_eq and vec_any_eq.

This is the workaround proposed in this issue.

We need to add a codegen/ module for partial_eq and call into that on ppc instead of doing the generic a.eq(b).all() dance.

Basically, on ppc, instead of just calling a.eq(b).all() like we are currently doing, we would just implement PartialEq::{eq,neq} using vec_all_eq and vec_any_eq for the vector types that support it.

stdsimd does not implement vec_{all,any}_eq yet, but once it does, implementing this workaround shouldn't be too hard.

We're working on improving this handling in the back end in general.

That's good to hear! Note that we will probably implement the workaround for 128 bit wide vectors initially only, so that smaller and wider vectors might still be using a.eq(b).all() afterwards.

gnzlbg commented 6 years ago

Tracked in https://github.com/gnzlbg/packed_simd/issues/9 , we are currently not testing the machine code of any portable operations but this would be a good start.