odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.12k stars 550 forks source link

Macos Intel crashes on calling `linalg.mul` #3762

Open powerc9000 opened 2 weeks ago

powerc9000 commented 2 weeks ago

Context

Odin: dev-2024-06:02f11dfde OS: macOS Sonoma 14.4.1 (build: 23F79, kernel: 23.4.0) CPU: Intel(R) Core(TM) i5-8500B CPU @ 3.00GHz RAM: 16384 MiB Backend: LLVM 18.1.6

Calling linalg.mul with a Matrix2 crashes with a EXC_I386_GPFLT (General fault). Only in -o:non or -debug o:speed does not share the issue.

from the discord it was believed this is because of bad code gen causing a bad stack pointer.

Expected Behavior

Dont crash.

Current Behavior

Crash

Failure Information (for bugs)

Disassembly

main`linalg.matrix_mul_vector-8549:
    0x100007260 <+0>:   movaps %xmm1, -0x58(%rsp)
    0x100007265 <+5>:   movaps %xmm0, -0x48(%rsp)
    0x10000726a <+10>:  movaps %xmm2, -0x38(%rsp)
    0x10000726f <+15>:  movaps -0x38(%rsp), %xmm0
    0x100007274 <+20>:  movaps -0x58(%rsp), %xmm1
    0x100007279 <+25>:  movaps -0x48(%rsp), %xmm2
    0x10000727e <+30>:  movlpd %xmm2, -0x10(%rsp)
    0x100007284 <+36>:  movlpd %xmm1, -0x8(%rsp)
    0x10000728a <+42>:  movlpd %xmm0, -0x18(%rsp)
    0x100007290 <+48>:  movq   $0x0, -0x20(%rsp)
->  0x100007299 <+57>:  movaps -0x10(%rsp), %xmm1
    0x10000729e <+62>:  movsd  -0x8(%rsp), %xmm0
    0x1000072a4 <+68>:  movss  -0x18(%rsp), %xmm3
    0x1000072aa <+74>:  movss  -0x14(%rsp), %xmm2
    0x1000072b0 <+80>:  movsldup %xmm3, %xmm3 ; xmm3 = xmm3[0,0,2,2] 
    0x1000072b4 <+84>:  movsldup %xmm2, %xmm2 ; xmm2 = xmm2[0,0,2,2] 
    0x1000072b8 <+88>:  mulps  %xmm3, %xmm1
    0x1000072bb <+91>:  mulps  %xmm2, %xmm0
    0x1000072be <+94>:  addps  %xmm1, %xmm0
    0x1000072c1 <+97>:  movq   $0x0, -0x28(%rsp)
    0x1000072ca <+106>: movlpd %xmm0, -0x28(%rsp)
    0x1000072d0 <+112>: movss  -0x28(%rsp), %xmm0
    0x1000072d6 <+118>: movss  -0x24(%rsp), %xmm1
    0x1000072dc <+124>: movss  %xmm1, -0x1c(%rsp)
    0x1000072e2 <+130>: movss  %xmm0, -0x20(%rsp)
    0x1000072e8 <+136>: movsd  -0x28(%rsp), %xmm0
    0x1000072ee <+142>: retq   

registers

General Purpose Registers:
       rax = 0x41a0000041a00000
       rbx = 0x0000000100601b90
       rcx = 0x00000001000060a0  main`runtime.default_logger_proc at core.odin:653
       rdx = 0x00007ff80db1aaf0  libsystem_m.dylib`_FE_DFL_DISABLE_SSE_DENORMS_ENV + 7552
       rdi = 0x00007ff7bfefed70
       rsi = 0x00007ff7bfefecc8
       rbp = 0x00007ff7bfefece0
       rsp = 0x00007ff7bfefec78
        r8 = 0x0000000100007bdb  "/odin/base/runtime/entry_unix.odin"
        r9 = 0x0000000000000001
       r10 = 0x0000000000000000
       r11 = 0x0000000000000088
       r12 = 0x00007ff7bfefee20
       r13 = 0x0000000000000000
       r14 = 0x0000000100007080  main`main at entry_unix.odin:50
       r15 = 0x00007ff7bfefefa0
       rip = 0x0000000100007299  main`linalg.matrix_mul_vector-8549 + 57 at general.odin:217:2
    rflags = 0x0000000000010246
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

Steps to Reproduce

Sample program

package main

import "core:math/linalg"

main :: proc() {
    v1 : linalg.Vector2f32 = {1, 2}
    rot := linalg.matrix2_rotate(f32(20))

    res := linalg.mul(rot, v1)
}
laytan commented 6 days ago

Looks like an alignment issue with the amd64 sysv ABI, you can see in the following snippet that it is allocating the parameter on align 4 and then loading it as if it is align 16:

define internal void @main.foos(<{ <2 x float>, <2 x float> }> %0, ptr noalias nocapture nonnull %__.context_ptr) {
decls:
  %1 = alloca [4 x float], align 4
  %2 = alloca [4 x float], align 32
  %b = alloca [4 x float], align 32
  br label %entry

entry:                                            ; preds = %decls
  store <{ <2 x float>, <2 x float> }> %0, ptr %1, align 1
  %3 = load <4 x float>, ptr %1, align 16
  %4 = load <4 x float>, ptr %1, align 16