Open p0nce opened 5 years ago
fcmp can do all I need! https://llvm.org/docs/LangRef.html#id297
Except that it returns a vector of bools, instead of masks. You'll need to check whether the IR you come up with yields the expected asm instructions.
Well the ldc.simd.cmpMask
template is using fcmp
(+ sext to appropriate vector type) for floating-point types, so all that'd be needed is supporting some more comparisons.
Pretty much, and do not expose them for integers.
I ended up using inlineIR directly https://github.com/AuburnSounds/intel-intrinsics/blob/master/source/inteli/types.d#L337 since I'm trying to see if I can do without the x86 intrinsic for generating CMPSD too. (Epilogue: using it for portability, but IR doesn't generate optimal code there)
I'd have preferred an ldc.simd
extension; reopening. ;)
To express all nuances you need 8 comparison modes at a minimum (the ones in the hardware CMPPS instruction), though its confusing to get the operand inversion right so I would advise to support the 14 operands needed for all nuances... (not sure what are the use for a "true" and "false" comparison mode (EDIT2: that exist in hardware only with VEX))
The
_mm_cmpnge_pd
and_mm_cmplt_pd
intrinsics should generate different instructions on x86. https://godbolt.org/z/q0Jy38This results in
clang
in various versions of the CMPPD instruction: https://www.felixcloutier.com/x86/cmppd (see result for NGE and LT, which are similar for finite input but differ if one is NaN).In
intel-intrinsics
for LDC I'm forced to implement the corresponding intrinsics this way:But this is incorrect because for floating-point "not greater or equal" is not equivalent to "less", it can also be "unordered".
In LDC we only have the most excellent
greaterMask!V
andgreaterOrEqualMask!V
which are not sufficient for float comparison if we want both correctness and efficiency.What would be a good solution?