ldc-developers / ldc

The LLVM-based D Compiler.
http://wiki.dlang.org/LDC
Other
1.19k stars 256 forks source link

Need more ldc.simd comparison masks #2957

Open p0nce opened 5 years ago

p0nce commented 5 years ago

The _mm_cmpnge_pd and _mm_cmplt_pd intrinsics should generate different instructions on x86. https://godbolt.org/z/q0Jy38

This results in clang in various versions of the CMPPD instruction: https://www.felixcloutier.com/x86/cmppd (see result for NGE and LT, which are similar for finite input but differ if one is NaN).

In intel-intrinsics for LDC I'm forced to implement the corresponding intrinsics this way:

    __m128d _mm_cmplt_pd (__m128d a, __m128d b) pure @safe
    {
        return cast(__m128d) greaterMask!double2(b, a);
    }
    __m128d _mm_cmpnge_pd (__m128d a, __m128d b) pure @safe
    {
        // Incorrect for NaN
        return _mm_cmplt_pd(b, a);
    }

But this is incorrect because for floating-point "not greater or equal" is not equivalent to "less", it can also be "unordered".

In LDC we only have the most excellent greaterMask!V and greaterOrEqualMask!V which are not sufficient for float comparison if we want both correctness and efficiency.

What would be a good solution?

p0nce commented 5 years ago

fcmp can do all I need! https://llvm.org/docs/LangRef.html#id297

kinke commented 5 years ago

Except that it returns a vector of bools, instead of masks. You'll need to check whether the IR you come up with yields the expected asm instructions.

kinke commented 5 years ago

Well the ldc.simd.cmpMask template is using fcmp (+ sext to appropriate vector type) for floating-point types, so all that'd be needed is supporting some more comparisons.

p0nce commented 5 years ago

Pretty much, and do not expose them for integers.

p0nce commented 5 years ago

I ended up using inlineIR directly https://github.com/AuburnSounds/intel-intrinsics/blob/master/source/inteli/types.d#L337 since I'm trying to see if I can do without the x86 intrinsic for generating CMPSD too. (Epilogue: using it for portability, but IR doesn't generate optimal code there)

kinke commented 5 years ago

I'd have preferred an ldc.simd extension; reopening. ;)

p0nce commented 5 years ago

To express all nuances you need 8 comparison modes at a minimum (the ones in the hardware CMPPS instruction), though its confusing to get the operand inversion right so I would advise to support the 14 operands needed for all nuances... (not sure what are the use for a "true" and "false" comparison mode (EDIT2: that exist in hardware only with VEX))