Closed gnzlbg closed 6 years ago
I'm not sure I follow exactly what the relationship between PartialEq
and a.eq(b).all()
is, but it seems to me that what is needed is an ability to do the equivalent of (from altivec.h
):
vec_all_eq() // All elements equal
vec_any_eq() // At least some elements are equal
The first is achieved with just the ==
operator and in PPC code as you mention will just use bit zero of CR6
set by the corresponding vcmpeq
variant. However, the latter is just the equivalent of not all elements are not equal
. In PPC code, that's essentially a crnot
of bit 2 of CR6
set by the corresponding vcmpeq
variant.
In IR, that's implemented by the llvm.ppc.altivec.vcmpequ[bhw].p
intrinsic with the first operand denoting the relationship that is sought.
I'm not sure I follow exactly what the relationship between PartialEq and a.eq(b).all() is
PartialEq
is currently implemented as a.eq(b).all()
. That's it.
it seems to me that what is needed is an ability to do the equivalent of
Yep, this needs using a different PartialEq
implementation on ppc architectures. I checked and LLVM does not properly lower a.eq(b).all()
(see: https://bugs.llvm.org/show_bug.cgi?id=36702#c6).
The problem remains: until that LLVM bug is solved, there is no way for a.eq(b).all()
to lower to vcmpeq
.
We know that our handling of vector reductions is generally lacking as it all naively lowers each of the operations that the reductions break down into. We're working on improving this handling in the back end in general.
In the meantime, there's always the option of using the intrinsics that we use for vec_all_eq
and vec_any_eq
.
In the meantime, there's always the option of using the intrinsics that we use for vec_all_eq and vec_any_eq.
This is the workaround proposed in this issue.
We need to add a codegen/ module for
partial_eq
and call into that onppc
instead of doing the generic a.eq(b).all() dance.
Basically, on ppc, instead of just calling a.eq(b).all()
like we are currently doing, we would just implement PartialEq::{eq,neq}
using vec_all_eq
and vec_any_eq
for the vector types that support it.
stdsimd
does not implement vec_{all,any}_eq
yet, but once it does, implementing this workaround shouldn't be too hard.
We're working on improving this handling in the back end in general.
That's good to hear! Note that we will probably implement the workaround for 128 bit wide vectors initially only, so that smaller and wider vectors might still be using a.eq(b).all()
afterwards.
Tracked in https://github.com/gnzlbg/packed_simd/issues/9 , we are currently not testing the machine code of any portable operations but this would be a good start.
PPC64LE has a
vcmpeq
instruction that returns a vector of bools (like on x86 and arm), but also sets the CR[6] to true if all elements of the vector of bools are true, which allows implementing==
using a single instruction since the vector reduction is not necessary.The current implementation of
PartialEq
for the portable vector types does ana.eq(b).all()
which might be hard for llvm to lower correctly tovcmpeq
becausellvm.vector.reduce.and
is pretty new. It might be worth it to check if this is generated correctly, and if not, figure out what to do. We need to add acodegen/
module forpartial_eq
and call into that on ppc instead of doing the generica.eq(b).all()
dance.