Codegen issue when using `core:simd/x86`

Drvi commented 9 months ago

Context

odin report:

        Odin: dev-2023-12:8943e94c
        OS:   Pop!_OS 22.04 LTS, Linux 6.5.6-76060506-generic
        CPU:  AMD Ryzen 9 7900 12-Core Processor             
        RAM:  63421 MiB

I've built Odin with LLVM 17.0.6 using this branch https://github.com/odin-lang/Odin/pull/3024 at https://github.com/odin-lang/Odin/commit/8943e94c37bd38a3bdd4a4f97a4b9baf8a2a2105. Without it, importing "core:simd/x86" wouldn't work.

Then calling odin build src/repro.odin -file where repro.odin is:

package repro

import "core:simd"
import simdx86 "core:simd/x86"

main :: proc() {
    input := simd.from_slice(#simd[16]i8, transmute([]i8){0..<16 = ','})
    commas := simd.from_slice(#simd[16]i8, transmute([]i8){0..<16 = ','})
    matches := transmute(simdx86.__m128i)simd.lanes_eq(input, commas)
    xxx := simdx86._mm_movemask_epi8(matches) // this line causes LLVM codegen failure
}

results in what seems to be a codegen bug.

Expected Behavior

The program should compile successfully and xxx should have 16 set bits.

Current Behavior

LLVM CODE GEN FAILED FOR PROCEDURE: simd_x86._mm_movemask_epi8
; Function Attrs: alwaysinline
define internal i32 @simd_x86._mm_movemask_epi8({ <1 x i64>, <1 x double> } %0) #5 {
decls:
  %1 = alloca <2 x i64>, align 16
  %2 = alloca { <8 x i8>, <1 x double> }, align 16
  br label %entry

entry:                                            ; preds = %decls
  store { <1 x i64>, <1 x double> } %0, ptr %1, align 8
  %3 = load <2 x i64>, ptr %1, align 16
  %4 = bitcast <2 x i64> %3 to <16 x i8>
  store <16 x i8> %4, ptr %2, align 16
  %5 = load { <8 x i8>, <1 x double> }, ptr %2, align 8
  %6 = call i32 @llvm.x86.sse2.pmovmskb.128({ <8 x i8>, <1 x double> } %5) #2
  ret i32 %6
}

Failure Logs

repro.ll

Yawning commented 8 months ago

Looking into this a bit more, I'm pretty sure this is a SYSV issue and that the emitted { <8 x i8>, <1 x double> } is RegClass_SSEInt8, RecClass_SSEUp.

This is an "educated guess" based on all the LLVM intrinsics being pulled in via the "c" calling convention and messing around with lbAbiAmd64SysV::classify_with changing the failure.

This package for the most part might "just work" on Windows with my branch but the code generation will still be utterly attrocious due to SROA being disabled (so all the force_inline intrinsics will spend a lot of time fucking around with loads/stores).

This might need a calling convention special case since these aren't really C functions but compiler builtins.

Yawning commented 8 months ago

Per laytanl, on windows the error is:

LLVM CODE GEN FAILED FOR PROCEDURE: simd_x86._mm_movemask_epi8
; Function Attrs: alwaysinline
define internal i32 @simd_x86._mm_movemask_epi8(ptr %0) #5 {
decls:
  %1 = alloca <16 x i8>, align 16
  br label %entry

entry:                                            ; preds = %decls
  %2 = load <2 x i64>, ptr %0, align 16
  %3 = bitcast <2 x i64> %2 to <16 x i8>
  store <16 x i8> %3, ptr %1, align 16
  %4 = call i32 @llvm.x86.sse2.pmovmskb.128(ptr %1) #2
  ret i32 %4
}

Intrinsic has incorrect argument type!
ptr @llvm.x86.sse2.pmovmskb.128

So it doesn't just work on Windows, but the reason for the breakage is the same in that "the platform native C calling convention is incorrect for the llvm.x86 intrinsics".

odin-lang / Odin