Support __fp16 vectors - Githubissues


Bugzilla Link	23305
Version	trunk
OS	All
Blocks	llvm/llvm-project#51454
CC	@ahmedbougacha,@tlively

Extended Description

__fp16 is a storage-only type, and there are two CodeGen variants:

soften to i16, promote using llvm.convert.to/from.fp16 (e.g., X86)
when LangOptions::NativeHalfType or HalfArgsAndReturns, use the LLVM "half" type, promote using fpext/fptrunc (e.g., AArch64)

In both cases, we don't do the right thing for vectors.

On X86, this:

typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;

void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
  *c = *a + *b;
}

generates the very broken:

  %3 = add <4 x i16> %1, %2

This is because the Sema::UsualUnaryConversions don't apply to VectorTypes (see Sema::CheckVectorOperands), so we never try to promote to v4f32 (as we would promote __fp16 to f32).

Even if we decide to reject that code and never do the implicit promotion, the alternative is also broken:

typedef __fp16 __attribute__((__ext_vector_type__(4))) v4f16;
typedef float __attribute__((__ext_vector_type__(4))) v4f32;

void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
  *c = __builtin_convertvector(*a, v4f32);
}

Generates:

  %2 = uitofp <4 x i16> %1 to <4 x float>

Even when "half" is used instead of i16 (AArch64, or after we migrate away from the convert intrinsics), we generate IR without the promotion:

  %3 = fadd <4 x half> %1, %2

Relying on the backend to do the promotion. However, this has slightly different semantics, because LLVM works at the instruction level, and clang at the expression level. Consider:

void foo(v4f16 *a, v4f16 *b, v4f16 *c) {
  *c = (*a + *b) + *c;
}

Doing the promotion in clang means the intermediate result is a v4f32. Doing it in LLVM means the intermediate result is truncated back to v4f16, before being extended again to v4f32.

This can give different result, and it's probably best to mirror the scalar clang behavior of promoting entire expressions.

llvm / llvm-project

Support __fp16 vectors #23679

Extended Description