Open elliottslaughter opened 2 years ago
Not sure how useful this is, but here's some quick tests (using 23559ea), macOS on M1. Exporting both SDKROOT
and INCLUDE_PATH
, using /usr/bin/clang++
and LLVM 11 through 14 from Homebrew, I get the following:
98% tests passed, 10 tests failed out of 549
Total Test time (real) = 16.96 sec
The following tests FAILED: 27 - atomicrmw.t (Subprocess aborted) 46 - bug372.t (SEGFAULT) 48 - bug372b.t (SEGFAULT) 49 - bug372c.t (SEGFAULT) 60 - cconv_array.t (Failed) 65 - class.t (SEGFAULT) 67 - class3.t (SEGFAULT) 272 - fakeasm.t (Failed) 423 - printfarray.t (SEGFAULT) 533 - vararg.t (Subprocess aborted)
- LLVM 12
98% tests passed, 10 tests failed out of 549
Total Test time (real) = 16.90 sec
The following tests FAILED: 27 - atomicrmw.t (SEGFAULT) 46 - bug372.t (SEGFAULT) 48 - bug372b.t (SEGFAULT) 49 - bug372c.t (SEGFAULT) 60 - cconv_array.t (Failed) 65 - class.t (SEGFAULT) 67 - class3.t (SEGFAULT) 272 - fakeasm.t (Failed) 423 - printfarray.t (SEGFAULT) 533 - vararg.t (Subprocess aborted)
- LLVM 13
98% tests passed, 11 tests failed out of 549
Total Test time (real) = 16.96 sec
The following tests FAILED: 27 - atomicrmw.t (Subprocess aborted) 46 - bug372.t (SEGFAULT) 48 - bug372b.t (SEGFAULT) 49 - bug372c.t (SEGFAULT) 50 - bug372d.t (SEGFAULT) 60 - cconv_array.t (Failed) 65 - class.t (SEGFAULT) 67 - class3.t (SEGFAULT) 272 - fakeasm.t (Failed) 423 - printfarray.t (SEGFAULT) 533 - vararg.t (Subprocess aborted)
- LLVM 14
98% tests passed, 11 tests failed out of 549
Total Test time (real) = 15.97 sec
The following tests FAILED: 27 - atomicrmw.t (SEGFAULT) 46 - bug372.t (SEGFAULT) 48 - bug372b.t (SEGFAULT) 49 - bug372c.t (SEGFAULT) 51 - bug4.t (SEGFAULT) 60 - cconv_array.t (Failed) 65 - class.t (SEGFAULT) 67 - class3.t (SEGFAULT) 272 - fakeasm.t (Failed) 423 - printfarray.t (SEGFAULT) 533 - vararg.t (SEGFAULT)
Interesting, thanks. I guess the issue identified in the title here is really a Linux problem, so that means the issue on Apple M1 is really somewhere else.
Thanks for taking the time to test this.
Here's some more (hopefully useful) information gleaned from running each failing test individually in lldb
. As before, using commit 23559ea, macOS on M1, and LLVM 13 from Homebrew.
atomicrmw.t
:
54321
atomicrmw.t:69: terra atomic_add_and_return(x : &int32,y : int32) : int32
atomicrmw.t:70: return atomicrmw("add", x, y, { syncscope = "", ordering = "acq_rel", align = native, isvolatile = false })
atomicrmw.t:69: end
definition {&int32,int32} -> int32
define dso_local i32 @"$atomic_add_and_return"(i32* %0, i32 %1) {
entry:
%2 = atomicrmw add i32* %0, i32 %1 acq_rel, align 4
ret i32 %2
}
assembly for function at address 0x10978c000
0x10978c000(+0): ldaxr w8, [x0]
0x10978c004(+4): add w9, w8, w1
0x10978c008(+8): stlxr w10, w9, [x0]
0x10978c00c(+12): cbnz w10, #-12
0x10978c010(+16): mov x0, x8
0x10978c014(+20): ret
22
atomicrmw.t:91: terra atomic_fadd(x : &double,y : double) : {}
atomicrmw.t:92: atomicrmw("fadd", x, y, { syncscope = "", ordering = "monotonic", align = native, isvolatile = false })
atomicrmw.t:91: end
definition {&double,double} -> {}
define dso_local void @"$atomic_fadd"(double* %0, double %1) {
entry:
%2 = atomicrmw fadd double* %0, double %1 monotonic, align 8
ret void
}
assembly for function at address 0x109794000
0x109794000(+0): ldr d1, [x0]
0x109794004(+4): b #16
0x109794008(+8): fmov d1, x10
0x10979400c(+12): cmp x10, x9
0x109794010(+16): b.eq #40
0x109794014(+20): fadd d2, d1, d0
0x109794018(+24): fmov x8, d2
0x10979401c(+28): fmov x9, d1
0x109794020(+32): ldaxr x10, [x0]
0x109794024(+36): cmp x10, x9
0x109794028(+40): b.ne #-32
0x10979402c(+44): stlxr wzr, x8, [x0]
0x109794030(+48): cbnz wzr, #-16
0x109794034(+52): b #-44
0x109794038(+56): ret
21
atomicrmw.t:69: terra atomic_add_and_return(x : &int32,y : int32) : int32
atomicrmw.t:70: return atomicrmw("add", x, y, { syncscope = "", ordering = "acq_rel", align = native, isvolatile = false })
atomicrmw.t:69: end
definition {&int32,int32} -> int32
define dso_local i32 @"$atomic_add_and_return"(i32* %0, i32 %1) {
entry:
%2 = atomicrmw add i32* %0, i32 %1 acq_rel, align 4
ret i32 %2
}
assembly for function at address 0x10978c000
0x10978c000(+0): ldaxr w8, [x0]
0x10978c004(+4): add w9, w8, w1
0x10978c008(+8): stlxr w10, w9, [x0]
0x10978c00c(+12): cbnz w10, #-12
0x10978c010(+16): mov x0, x8
0x10978c014(+20): ret
LLVM ERROR: Cannot select: intrinsic %llvm.aarch64.stlxr
Process 49454 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x00000001a314ad98 libsystem_kernel.dylib`__pthread_kill + 8
bug372.t
:
Process 49620 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x3131676e69727450) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
bug372b.t
:
Process 49681 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x3131676e69727450) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
bug372c.t
:
Expect this ouput:
Process 49745 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x697274530b008c00) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
bug372d.t
:
Expect this ouput:
Process 61876 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfff9000000000000) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
cconv_array.t
:
running test for int8[0]
running test for int8[1]
running test for int8[2]
running test for int8[3]
running test for int8[4]
running test for int8[5]
running test for int8[6]
running test for int8[7]
running test for int8[8]
running test for int8[9]
cconv_array.t:53: NYI: cannot call this C function (yet)
stack traceback:
[C]: in function 'caller'
cconv_array.t:53: in function 'run_test_case'
cconv_array.t:64: in main chunk
Process 49838 exited with status = 1 (0x00000001)
class.t
:
metatype init of Animal
allocating slot 1 to method speak in type Animal
stub speak in Animal
metatype init of Dog
importing speak at slot 1 in type Dog
override slot 1 with method speak in type Dog
stub speak in Dog
metatype init of Cat
importing speak at slot 1 in type Cat
override slot 1 with method speak in type Cat
stub speak in Cat
meow! 1876945344
woof! 1876945344
metatype init of P
allocating slot 1 to method add in type P
stub add in P
metatype init of C$1
importing add at slot 1 in type C$1
allocating slot 2 to method sub in type C$1
stub add in C$1
stub sub in C$1
defining interface Add for type P
defining interface Add for type C$1
defining interface Sub for type C$1
interface xA�
Process 49941 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
class3.t
:
metatype init of Leaf
allocating slot 1 to method print in type Leaf
stub print in Leaf
metatype init of Node
importing print at slot 1 in type Node
override slot 1 with method print in type Node
stub print in Node
defining interface Prints for type Leaf
defining interface Prints for type Node
Process 50072 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xd50) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
fakeasm.t
:
fakeasm.t:32: assertion failed!
stack traceback:
[C]: in function 'assert'
fakeasm.t:32: in main chunk
Process 50169 exited with status = 1 (0x00000001)
Adding a print('"' ... s ... '"')
before the assert
shows the string s
contains 1 3 4 6 what 1876945344\n
when inside lldb
, with the final integer varying seemingly randomly, except for the beginning 18
which seems to remain constant, when run outside of lldb
. Bear in mind this is far from exhaistive testing, and just observations from a few runs.
printfarray.t
:
Process 53567 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8000000060) frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
vararg.t
:
Assertion failed: (Ty && "Invalid GetElementPtrInst indices for type!"), function checkGEPType, file Instructions.h, line 921.
Process 53786 stopped
thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert frame #4: 0x0000000100038d1c terra`llvm::checkGEPType(Ty=0x0000000000000000) at Instructions.h:921:3 918 // message on bad indexes for a gep instruction. 919 // 920 inline Type checkGEPType(Type Ty) { -> 921 assert(Ty && "Invalid GetElementPtrInst indices for type!"); 922 return Ty; 923 } 924 Target 0: (terra) stopped.
cconv_array.t
seems to be straight up a case of something being not yet implemented in LuaJIT. I don't think there's anything we can do for that one except turn the test off for when we're on M1 macOS.
Based on the others, it looks like a lot of them are getting caught in strlen
, which perhaps indicates an issue with memory layout. But beyond that it's not obvious to me what the issue is. Maybe LLVM is doing an optimization that breaks type punning, but that's a random guess.
This is a shot in the dark, but what happens when you change {fastmath=false}
to false
in:
That should completely shut off optimizations in the JIT.
For these tests, I think what it will take to make progress is to minimize each test: i.e., delete lines from the test until the the failure behavior changes. Once we've got the smallest version of each test that fails with the same behavior as the original, then we can (hopefully) see what specifically it is about that test that is causing trouble, and then form a hypothesis.
Replacing {fastmath=false}
to just false
seems to have no effect - looks like the same tests are failing. Backtraces for the strlen
issues (in the below example, specifically running bug372d.t
) traces to a point outside of any loaded sections:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfff9000000000000)
* frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
frame #1: 0x00000001a30436a0 libsystem_c.dylib`__vfprintf + 4544
frame #2: 0x00000001a30531d8 libsystem_c.dylib`vfprintf_l + 68
frame #3: 0x00000001a3070f38 libsystem_c.dylib`printf + 80
frame #4: 0x00000001095e80f8
(lldb) image lookup -va 0x00000001095e80f8
(lldb)
Looks like the code at frame 4 might be generated from Terra's JIT. Setting a breakpoint on the address shows it's at least written to by llvm::RuntimeDyldImpl::emitSection
:
(lldb) w s e 0x00000001095e80f8
Watchpoint created: Watchpoint 1: addr = 0x1095e80f8 size = 8 state = enabled type = w
(lldb) c
Process 62850 resuming
...
Process 62850 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=258, subcode=0x1095e80f0)
* frame #0: 0x00000001a3197184 libsystem_platform.dylib`_platform_memmove + 180
frame #1: 0x0000000101491784 terra`llvm::RuntimeDyldImpl::emitSection(llvm::object::ObjectFile const&, llvm::object::SectionRef const&, bool) + 756
frame #2: 0x0000000101490a08 terra`llvm::RuntimeDyldImpl::findOrEmitSection(llvm::object::ObjectFile const&, llvm::object::SectionRef const&, bool, std::__1::map<llvm::object::SectionRef, unsigned int, std::__1::less<llvm::object::SectionRef>, std::__1::allocator<std::__1::pair<llvm::object::SectionRef const, unsigned int> > >&) + 120
frame #3: 0x000000010148f704 terra`llvm::RuntimeDyldImpl::loadObjectImpl(llvm::object::ObjectFile const&) + 1952
frame #4: 0x00000001014a4570 terra`llvm::RuntimeDyldMachO::loadObject(llvm::object::ObjectFile const&) + 52
frame #5: 0x0000000101493824 terra`llvm::RuntimeDyld::loadObject(llvm::object::ObjectFile const&) + 800
frame #6: 0x000000010141e43c terra`llvm::MCJIT::generateCodeForModule(llvm::Module*) + 208
frame #7: 0x000000010141f128 terra`llvm::MCJIT::findSymbol(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) + 544
frame #8: 0x000000010141ed98 terra`llvm::MCJIT::getSymbolAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) + 124
frame #9: 0x000000010141f3ac terra`llvm::MCJIT::getGlobalValueAddress(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 52
frame #10: 0x0000000100050c14 terra`JITGlobalValue(CU=0x0000600002c04100, gv=0x0000600002c10d88) at tcompiler.cpp:3812:24
frame #11: 0x0000000100009574 terra`terra_jit(L=0x000000010a800380) at tcompiler.cpp:3820:17
frame #12: 0x0000000103070388 terra`lj_BC_FUNCC + 44
frame #13: 0x0000000103017ddc terra`lua_pcall(L=<unavailable>, nargs=<unavailable>, nresults=<unavailable>, errfunc=<unavailable>) at lj_api.c:1145:12 [opt]
frame #14: 0x00000001000052dc terra`docall(L=0x000000010a800380, narg=0, clear=0) at main.cpp:339:14
frame #15: 0x0000000100004e84 terra`main(argc=2, argv=0x000000016fdfee90) at main.cpp:119:13
frame #16: 0x000000010936108c dyld`start + 520
(lldb) w delete 1
1 watchpoints deleted.
(lldb) c
Process 62850 resuming
...
Expect this ouput:
Process 62850 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xfff9000000000000)
* frame #0: 0x00000001a3194864 libsystem_platform.dylib`_platform_strlen + 4
frame #1: 0x00000001a30436a0 libsystem_c.dylib`__vfprintf + 4544
frame #2: 0x00000001a30531d8 libsystem_c.dylib`vfprintf_l + 68
frame #3: 0x00000001a3070f38 libsystem_c.dylib`printf + 80
frame #4: 0x00000001095e80f8
I think bug372.t
would be the best version of this test to debug, assuming it still fails, since it's the shortest.
Unfortunately, debug info does not seem to be functioning on ARM. (You can try with terra -g bug372.t
, but don't get your hopes up.)
So I think the next best thing we can do is "hardhat" debugging, which in this case probably is going to involve printf
on all of the pointers in the program.
Not sure if relevant, but e.g. in bug372d.t
, dumping the following addresses:
C.printf("Config: 0x%0.16lx\n", [long](&(config)))
C.printf("xBCLeftInflowProfile: 0x%0.16lx\n", [long](&(config.BC.xBCLeftInflowProfile.u.File.FileDir)))
C.printf("xBCLeftHeat: 0x%0.16lx\n", [long](&(config.BC.xBCLeftHeat.u.File.FileDir)))
Gives the following output:
Config: 0xfff9000000000000
xBCLeftInflowProfile: 0xfff9000000000000
xBCLeftHeat: 0xfff9000000000000
I'm far from even remotely familiar with Terra and its syntax, but is this expected? Shouldn't the FileDir
s be offset into the Config
struct?
Is a long
32 bits on M1? It may simply be that you're throwing away the relevant bits of the address.
Better to use %p
and pass the pointer:
C.printf("Config: 0x%0.16p\n", &(config))
C.printf("xBCLeftInflowProfile: 0x%0.16p\n", &(config.BC.xBCLeftInflowProfile.u.File.FileDir))
C.printf("xBCLeftHeat: 0x%0.16p\n", &(config.BC.xBCLeftHeat.u.File.FileDir))
Using %p
still outputs just 0x0xfff9000000000000
. Something doesn't seem right - clang
reports long
to be 8 bytes.
Ok, well there is definitely something wrong because those pointers should be different addresses. E.g. here's output on x86_64:
Config: 0x0x00007ff7b331cd08
xBCLeftInflowProfile: 0x0x00007ff7b331cf40
xBCLeftHeat: 0x0x00007ff7b331cd0c
Maybe try this test? This should show us the LLVM IR before and after optimizations, plus the assembly, so maybe we'll spot something in here.
local C = terralib.includecstring
[[
#include "stdbool.h"
#include "stdint.h"
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
typedef int TempProfile;
#define TempProfile_File 0
typedef int InflowProfile;
#define InflowProfile_File 0
#define InflowProfile_SuctionAndBlowing 1
struct Config {
struct {
struct {
int32_t type;
union {
struct {
int8_t FileDir[256];
} File;
} u;
} xBCLeftHeat;
struct {
int32_t type;
union {
struct {
double addedVelocity;
int8_t FileDir[256];
} File;
struct {
double sigma;
struct {
uint32_t length;
double values[10];
} beta;
double Zw;
struct {
uint32_t length;
double values[10];
} A;
struct {
uint32_t length;
double values[10];
} omega;
} SuctionAndBlowing;
} u;
} yBCLeftInflowProfile;
struct {
int32_t type;
union {
struct {
double addedVelocity;
int8_t FileDir[256];
} File;
struct {
double sigma;
struct {
uint32_t length;
double values[10];
} beta;
double Zw;
struct {
uint32_t length;
double values[10];
} A;
struct {
uint32_t length;
double values[10];
} omega;
} SuctionAndBlowing;
} u;
} xBCLeftInflowProfile;
} BC;
};
]]
local Config = C.Config
terra main()
var config : Config
C.printf("Config: 0x%0.16p\n", &(config))
C.printf("xBCLeftInflowProfile: 0x%0.16p\n", &(config.BC.xBCLeftInflowProfile.u.File.FileDir))
C.printf("xBCLeftHeat: 0x%0.16p\n", &(config.BC.xBCLeftHeat.u.File.FileDir))
end
print(terralib.saveobj(nil, "llvmir", {main=main}, nil, nil, false))
main:disas()
Output from the above:
define dso_local void @"$main"() {
entry:
%config = alloca %struct.Config, align 8
%0 = call i32 (i8*, %struct.Config*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, %struct.Config*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string", i64 0, i64 0), %struct.Config* nonnull %config)
%1 = getelementptr inbounds %struct.Config, %struct.Config* %config, i64 0, i32 0, i32 2, i32 1, i32 0, i32 1
%2 = bitcast %struct.anon.8* %1 to [256 x i8]*
%3 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string.1", i64 0, i64 0), [256 x i8]* nonnull %2)
%4 = getelementptr inbounds %struct.Config, %struct.Config* %config, i64 0, i32 0, i32 0, i32 1, i32 0, i32 0
%5 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string.2", i64 0, i64 0), [256 x i8]* nonnull %4)
ret void
}
; ModuleID = 'terra'
source_filename = "terra"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-darwin21.6.0"
%struct.Config = type { %struct.anon.1 }
%struct.anon.1 = type { %struct.anon.2, %struct.anon.4, %struct.anon.4 }
%struct.anon.2 = type { i32, %union.anon }
%union.anon = type { %struct.anon.3 }
%struct.anon.3 = type { [256 x i8] }
%struct.anon.4 = type { i32, %union.anon.5 }
%union.anon.5 = type { %struct.anon.7 }
%struct.anon.7 = type { double, %struct.anon.8, double, %struct.anon.8, %struct.anon.8 }
%struct.anon.8 = type { i32, [10 x double] }
%struct.anon.6 = type { double, [256 x i8] }
@"$string" = private unnamed_addr constant [32 x i8] c"Config: 0x%0.16p\0A\00", align 1
@"$string.1" = private unnamed_addr constant [32 x i8] c"xBCLeftInflowProfile: 0x%0.16p\0A\00", align 1
@"$string.2" = private unnamed_addr constant [32 x i8] c"xBCLeftHeat: 0x%0.16p\0A\00", align 1
define dso_local void @main() {
entry:
%config = alloca %struct.Config, align 8
%0 = call i32 (i8*, %struct.Config*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, %struct.Config*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string", i32 0, i32 0), %struct.Config* %config)
%1 = getelementptr %struct.Config, %struct.Config* %config, i32 0, i32 0
%2 = getelementptr %struct.anon.1, %struct.anon.1* %1, i32 0, i32 2
%3 = getelementptr %struct.anon.4, %struct.anon.4* %2, i32 0, i32 1
%4 = getelementptr %union.anon.5, %union.anon.5* %3, i32 0, i32 0
%5 = bitcast %struct.anon.7* %4 to %struct.anon.6*
%6 = getelementptr %struct.anon.6, %struct.anon.6* %5, i32 0, i32 1
%7 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string.1", i32 0, i32 0), [256 x i8]* %6)
%8 = getelementptr %struct.Config, %struct.Config* %config, i32 0, i32 0
%9 = getelementptr %struct.anon.1, %struct.anon.1* %8, i32 0, i32 0
%10 = getelementptr %struct.anon.2, %struct.anon.2* %9, i32 0, i32 1
%11 = getelementptr %union.anon, %union.anon* %10, i32 0, i32 0
%12 = getelementptr %struct.anon.3, %struct.anon.3* %11, i32 0, i32 0
%13 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([32 x i8], [32 x i8]* @"$string.2", i32 0, i32 0), [256 x i8]* %12)
ret void
}
declare dso_local i32 @printf(i8*, ...)
definition {} -> {}
assembly for function at address 0x10d674000
0x10d674000(+0): stp x28, x27, [sp, #-48]!
0x10d674004(+4): stp x20, x19, [sp, #16]
0x10d674008(+8): stp x29, x30, [sp, #32]
0x10d67400c(+12): sub sp, sp, #848
0x10d674010(+16): adrp x0, #0
0x10d674014(+20): ldr x0, [x0, #120]
0x10d674018(+24): adrp x19, #0
0x10d67401c(+28): ldr x19, [x19, #112]
0x10d674020(+32): add x20, sp, #8
0x10d674024(+36): add x1, sp, #8
0x10d674028(+40): blr x19
0x10d67402c(+44): add x1, x20, #568
0x10d674030(+48): adrp x0, #0
0x10d674034(+52): ldr x0, [x0, #104]
0x10d674038(+56): blr x19
0x10d67403c(+60): orr x1, x20, #0x4
0x10d674040(+64): adrp x0, #0
0x10d674044(+68): ldr x0, [x0, #96]
0x10d674048(+72): blr x19
0x10d67404c(+76): add sp, sp, #848
0x10d674050(+80): ldp x29, x30, [sp, #32]
0x10d674054(+84): ldp x20, x19, [sp, #16]
0x10d674058(+88): ldp x28, x27, [sp], #48
0x10d67405c(+92): ret
So running the test from https://github.com/terralang/terra/issues/597#issuecomment-1221050915 gives:
Config: 0x0x0000000000000000
xBCLeftInflowProfile: 0x0x0000000000000000
xBCLeftHeat: 0x0x0000000000000000
Which is strange. Even stranger, re-adding the two memcpy
calls to copy the strings before printing addresses gives:
Config: 0x0x0000000000007ffc
xBCLeftInflowProfile: 0x0x0000000000007ffc
xBCLeftHeat: 0x0x0000000000007ffc
Removing the call to terralib.saveobj
and disas
seems to flip the above behaviour with 0x0000000000000000
/0x0000000000007ffc
. Not a clue what's going on.
Could this somehow be an issue with vararg behaviour on aarch64 macOS being different to aarch64 Linux/general ARMv8 or aarch64 ABI? https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
Hm, maybe. I'd expect LLVM to handle most differences in the C ABI, at least when calling functions with scalar arguments (including, I believe, variadic functions).
But let's test that! We can run Clang to see what code it generates:
#include "stdbool.h"
#include "stdint.h"
#include "stdlib.h"
#include "stdio.h"
#include "string.h"
typedef int TempProfile;
#define TempProfile_File 0
typedef int InflowProfile;
#define InflowProfile_File 0
#define InflowProfile_SuctionAndBlowing 1
struct Config {
struct {
struct {
int32_t type;
union {
struct {
int8_t FileDir[256];
} File;
} u;
} xBCLeftHeat;
struct {
int32_t type;
union {
struct {
double addedVelocity;
int8_t FileDir[256];
} File;
struct {
double sigma;
struct {
uint32_t length;
double values[10];
} beta;
double Zw;
struct {
uint32_t length;
double values[10];
} A;
struct {
uint32_t length;
double values[10];
} omega;
} SuctionAndBlowing;
} u;
} yBCLeftInflowProfile;
struct {
int32_t type;
union {
struct {
double addedVelocity;
int8_t FileDir[256];
} File;
struct {
double sigma;
struct {
uint32_t length;
double values[10];
} beta;
double Zw;
struct {
uint32_t length;
double values[10];
} A;
struct {
uint32_t length;
double values[10];
} omega;
} SuctionAndBlowing;
} u;
} xBCLeftInflowProfile;
} BC;
};
int main() {
struct Config config;
printf("Config: %p\n", &(config));
printf("xBCLeftInflowProfile: %p\n", &(config.BC.xBCLeftInflowProfile.u.File.FileDir));
printf("xBCLeftHeat: %p\n", &(config.BC.xBCLeftHeat.u.File.FileDir));
return 0;
}
Then run:
clang bug372d.c -o bug372.ll -S -emit-llvm
Should give you the unoptimized LLVM IR, which we can compare to what Terra produces.
You can also run the program to sanity check it's behavior:
(On x86:)
$ clang bug372d.c -o bug372
$ ./bug372
Config: 0x7ff7babbb500
xBCLeftInflowProfile: 0x7ff7babbb738
xBCLeftHeat: 0x7ff7babbb504
For best results, it's important to use the same Clang that you used to build Terra.
Note in the C program I also used %p
instead of %0.16p
. It turns out the latter is undefined behavior for pointers. I guess we should try that out in Terra too in case it makes any difference.
We could potentially add back more of bug372d.t
, and write the corresponding pieces of bug372d.c
. By comparing the generated code, we should eventually be able to spot the problem. Not all differences are an issue (Terra intentionally does some of its code generation differently), but the problem will (likely) be contained in the differences in the generated code.
Another possible explanation is that we're doing something bad with the JIT, but if so I don't see the mechanism for how that would be going wrong so far.
bug372d.c
:
; ModuleID = 'bug372d.c'
source_filename = "bug372d.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx12.0.0"
%struct.Config = type { %struct.anon } %struct.anon = type { %struct.anon.0, %struct.anon.2, %struct.anon.9 } %struct.anon.0 = type { i32, %union.anon } %union.anon = type { %struct.anon.1 } %struct.anon.1 = type { [256 x i8] } %struct.anon.2 = type { i32, %union.anon.3 } %union.anon.3 = type { %struct.anon.5 } %struct.anon.5 = type { double, %struct.anon.6, double, %struct.anon.7, %struct.anon.8 } %struct.anon.6 = type { i32, [10 x double] } %struct.anon.7 = type { i32, [10 x double] } %struct.anon.8 = type { i32, [10 x double] } %struct.anon.9 = type { i32, %union.anon.10 } %union.anon.10 = type { %struct.anon.12 } %struct.anon.12 = type { double, %struct.anon.13, double, %struct.anon.14, %struct.anon.15 } %struct.anon.13 = type { i32, [10 x double] } %struct.anon.14 = type { i32, [10 x double] } %struct.anon.15 = type { i32, [10 x double] } %struct.anon.11 = type { double, [256 x i8] }
@.str = private unnamed_addr constant [26 x i8] c"Config: %p\0A\00", align 1 @.str.1 = private unnamed_addr constant [26 x i8] c"xBCLeftInflowProfile: %p\0A\00", align 1 @.str.2 = private unnamed_addr constant [26 x i8] c"xBCLeftHeat: %p\0A\00", align 1
; Function Attrs: noinline nounwind optnone ssp uwtable define i32 @main() #0 { %1 = alloca i32, align 4 %2 = alloca %struct.Config, align 8 store i32 0, i32 %1, align 4 %3 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([26 x i8], [26 x i8] @.str, i64 0, i64 0), %struct.Config %2) %4 = getelementptr inbounds %struct.Config, %struct.Config %2, i32 0, i32 0 %5 = getelementptr inbounds %struct.anon, %struct.anon %4, i32 0, i32 2 %6 = getelementptr inbounds %struct.anon.9, %struct.anon.9 %5, i32 0, i32 1 %7 = bitcast %union.anon.10 %6 to %struct.anon.11 %8 = getelementptr inbounds %struct.anon.11, %struct.anon.11 %7, i32 0, i32 1 %9 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([26 x i8], [26 x i8] @.str.1, i64 0, i64 0), [256 x i8] %8) %10 = getelementptr inbounds %struct.Config, %struct.Config %2, i32 0, i32 0 %11 = getelementptr inbounds %struct.anon, %struct.anon %10, i32 0, i32 0 %12 = getelementptr inbounds %struct.anon.0, %struct.anon.0 %11, i32 0, i32 1 %13 = bitcast %union.anon %12 to %struct.anon.1 %14 = getelementptr inbounds %struct.anon.1, %struct.anon.1 %13, i32 0, i32 0 %15 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([26 x i8], [26 x i8] @.str.2, i64 0, i64 0), [256 x i8]* %14) ret i32 0 }
declare i32 @printf(i8*, ...) #1
attributes #0 = { noinline nounwind optnone ssp uwtable "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+v8.5a,+zcm,+zcz" } attributes #1 = { "frame-pointer"="non-leaf" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+v8.5a,+zcm,+zcz" }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7, !8} !llvm.ident = !{!9}
!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 12, i32 3]} !1 = !{i32 1, !"wchar_size", i32 4} !2 = !{i32 1, !"branch-target-enforcement", i32 0} !3 = !{i32 1, !"sign-return-address", i32 0} !4 = !{i32 1, !"sign-return-address-all", i32 0} !5 = !{i32 1, !"sign-return-address-with-bkey", i32 0} !6 = !{i32 7, !"PIC Level", i32 2} !7 = !{i32 7, !"uwtable", i32 1} !8 = !{i32 7, !"frame-pointer", i32 1} !9 = !{!"Apple clang version 13.1.6 (clang-1316.0.21.2.5)"}
- Sanity check of `bug372d.c`:
```sh
$ clang bug372d.c -o bug372 && ./bug372
Config: 0x16daf2bc0
xBCLeftInflowProfile: 0x16daf2df8
xBCLeftHeat: 0x16daf2bc4
%p
in bug372d.t
:
Expect this ouput:
Config: 0xfff9000000000000
xBCLeftInflowProfile: 0xfff9000000000000
xBCLeftHeat: 0xfff9000000000000
zsh: segmentation fault ../build/bin/terra bug372d.t
So, been doing a bit of digging, and this does seem to be a calling convention issue.
Here's the disassembled _main
of the C version, bug372d.c
:
0000000100003ec4 <_main>:
100003ec4: f4 4f be a9 stp x20, x19, [sp, #-32]!
100003ec8: fd 7b 01 a9 stp x29, x30, [sp, #16]
100003ecc: fd 43 00 91 add x29, sp, #16
100003ed0: ff 83 0d d1 sub sp, sp, #864
100003ed4: 1f 20 03 d5 nop
100003ed8: 88 09 00 58 ldr x8, 0x100004008 <_printf+0x100004008>
100003edc: 08 01 40 f9 ldr x8, [x8]
100003ee0: a8 83 1e f8 stur x8, [x29, #-24]
100003ee4: f3 43 00 91 add x19, sp, #16
100003ee8: f3 03 00 f9 str x19, [sp]
100003eec: e0 03 00 10 adr x0, #124
100003ef0: 1f 20 03 d5 nop
100003ef4: 1a 00 00 94 bl 0x100003f5c <_printf+0x100003f5c>
100003ef8: 68 e2 08 91 add x8, x19, #568
100003efc: e8 03 00 f9 str x8, [sp]
100003f00: 00 04 00 50 adr x0, #130
100003f04: 1f 20 03 d5 nop
100003f08: 15 00 00 94 bl 0x100003f5c <_printf+0x100003f5c>
100003f0c: 68 02 7e b2 orr x8, x19, #0x4
100003f10: e8 03 00 f9 str x8, [sp]
100003f14: 40 04 00 10 adr x0, #136
100003f18: 1f 20 03 d5 nop
100003f1c: 10 00 00 94 bl 0x100003f5c <_printf+0x100003f5c>
100003f20: a8 83 5e f8 ldur x8, [x29, #-24]
100003f24: 1f 20 03 d5 nop
100003f28: 09 07 00 58 ldr x9, 0x100004008 <_printf+0x100004008>
100003f2c: 29 01 40 f9 ldr x9, [x9]
100003f30: 3f 01 08 eb cmp x9, x8
100003f34: c1 00 00 54 b.ne 0x100003f4c <_main+0x88>
100003f38: 00 00 80 52 mov w0, #0
100003f3c: ff 83 0d 91 add sp, sp, #864
100003f40: fd 7b 41 a9 ldp x29, x30, [sp, #16]
100003f44: f4 4f c2 a8 ldp x20, x19, [sp], #32
100003f48: c0 03 5f d6 ret
100003f4c: 01 00 00 94 bl 0x100003f50 <_printf+0x100003f50>
Note how it stores the first argument to printf
in x0
, and the second argument (the vararg one) on the stack (str x8, [sp]
). I was under the impression x0-x7
were used for variable arguments where possible, and overflow would be placed onto the stack, but I could be wrong.
Anyway, disassembly of the Terra-generated _main
(not _$main
), built from the LLVM IR output from https://github.com/terralang/terra/issues/597#issuecomment-1221050915, is as follows:
0000000100003e4c <_main>:
100003e4c: f4 4f be a9 stp x20, x19, [sp, #-32]!
100003e50: fd 7b 01 a9 stp x29, x30, [sp, #16]
100003e54: ff 43 0d d1 sub sp, sp, #848
100003e58: 00 05 00 10 adr x0, #160
100003e5c: 1f 20 03 d5 nop
100003e60: e1 23 00 91 add x1, sp, #8
100003e64: f3 23 00 91 add x19, sp, #8
100003e68: 21 00 00 94 bl 0x100003eec <_printf+0x100003eec>
100003e6c: 20 05 00 50 adr x0, #166
100003e70: 61 e2 08 91 add x1, x19, #568
100003e74: 1f 20 03 d5 nop
100003e78: 1d 00 00 94 bl 0x100003eec <_printf+0x100003eec>
100003e7c: 80 05 00 10 adr x0, #176
100003e80: 61 02 7e b2 orr x1, x19, #0x4
100003e84: 1f 20 03 d5 nop
100003e88: 19 00 00 94 bl 0x100003eec <_printf+0x100003eec>
100003e8c: ff 43 0d 91 add sp, sp, #848
100003e90: fd 7b 41 a9 ldp x29, x30, [sp, #16]
100003e94: f4 4f c2 a8 ldp x20, x19, [sp], #32
100003e98: c0 03 5f d6 ret
Note how it's using both x0
and x1
for passing arguments to printf
. It's even calculating the correct addresses: the 2nd and 3rd addresses should be 568 bytes and 4 bytes greater than &config
, respectively. However, there's no stack.
Running the first _main
gives the following output:
Config: 0x16f0de690
xBCLeftInflowProfile: 0x16f0de8c8
xBCLeftHeat: 0x16f0de694
And the Terra-generated version gives:
Config: 0x100b58310
xBCLeftInflowProfile: 0x100b58310
xBCLeftHeat: 0x100b58310
This looked strange initially as multiple runs of the C version always seemed to have the stack somewhere in the 0x16xxxxxxx
range. So, running the Terra-generated _main
through lldb
shows that x1
is having the correct address calculated:
...
(lldb)
Process 29276 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003e68 bug372_terra`main + 28
bug372_terra`main:
-> 0x100003e68 <+28>: bl 0x100003eec ; symbol stub for: printf
0x100003e6c <+32>: adr x0, #0xa6 ; "xBCLeftInflowProfile: %p\n"
0x100003e70 <+36>: add x1, x19, #0x238
0x100003e74 <+40>: nop
Target 0: (bug372_terra) stopped.
(lldb) reg read x0 x1
x0 = 0x0000000100003ef8 "Config: %p\n"
x1 = 0x000000016fdfe648
...
(lldb)
Process 29276 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003e78 bug372_terra`main + 44
bug372_terra`main:
-> 0x100003e78 <+44>: bl 0x100003eec ; symbol stub for: printf
0x100003e7c <+48>: adr x0, #0xb0 ; "xBCLeftHeat: %p\n"
0x100003e80 <+52>: orr x1, x19, #0x4
0x100003e84 <+56>: nop
Target 0: (bug372_terra) stopped.
(lldb) reg r x0 x1
x0 = 0x0000000100003f12 "xBCLeftInflowProfile: %p\n"
x1 = 0x000000016fdfe880
...
(lldb)
Process 29276 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003e88 bug372_terra`main + 60
bug372_terra`main:
-> 0x100003e88 <+60>: bl 0x100003eec ; symbol stub for: printf
0x100003e8c <+64>: add sp, sp, #0x350
0x100003e90 <+68>: ldp x29, x30, [sp, #0x10]
0x100003e94 <+72>: ldp x20, x19, [sp], #0x20
Target 0: (bug372_terra) stopped.
(lldb) reg r x0 x1
x0 = 0x0000000100003f2c "xBCLeftHeat: %p\n"
x1 = 0x000000016fdfe64c
(lldb) x/2x $sp
0x16fdfe640: 0x00084310 0x00000001
So, Terra-generated IR is, for some reason, placing the printf
vararg into x1
, while the C-generated IR is placing it on the stack, which is where printf
is reading it from - note the last command run above, showing 0x100084310
on the stack.
I can see in the LLVM IR that Terra for some reason is defining a different set of arguments for printf
:
%7 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.1", i32 0, i32 0), [256 x i8]* %6)
%9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @.str.1, i64 0, i64 0), [256 x i8]* %8)
I'm not very familiar with AAPCS64, however the docs at https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#id82 seem to imply that variadic arguments can first be passed in registers, as Terra is doing. However, Apple's documentation at https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms seems to differ in that it assigns variadic arguments to 8-byte stack slots instead of registers. This could explain the issues being seen here.
Looks like this issue is with Terra's generation of the printf
calls. Replacing the calls to i32 (i8*, %struct.Config*, ...)
/i32 (i8*, [256 x i8]*, ...)
with calls to i32 (i8*, ...)
allows LLVM to properly pass the varargs:
; ModuleID = 'terra'
source_filename = "terra"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-darwin21.6.0"
%struct.Config = type { %struct.anon.1 }
%struct.anon.1 = type { %struct.anon.2, %struct.anon.4, %struct.anon.4 }
%struct.anon.2 = type { i32, %union.anon }
%union.anon = type { %struct.anon.3 }
%struct.anon.3 = type { [256 x i8] }
%struct.anon.4 = type { i32, %union.anon.5 }
%union.anon.5 = type { %struct.anon.7 }
%struct.anon.7 = type { double, %struct.anon.8, double, %struct.anon.8, %struct.anon.8 }
%struct.anon.8 = type { i32, [10 x double] }
%struct.anon.6 = type { double, [256 x i8] }
@"$string" = private unnamed_addr constant [26 x i8] c"Config: %p\0A\00", align 1
@"$string.1" = private unnamed_addr constant [26 x i8] c"xBCLeftInflowProfile: %p\0A\00", align 1
@"$string.2" = private unnamed_addr constant [26 x i8] c"xBCLeftHeat: %p\0A\00", align 1
define dso_local void @main() {
entry:
%config = alloca %struct.Config, align 8
;%0 = call i32 (i8*, %struct.Config*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, %struct.Config*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string", i32 0, i32 0), %struct.Config* %config)
%0 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string", i32 0, i32 0), %struct.Config* %config)
%1 = getelementptr %struct.Config, %struct.Config* %config, i32 0, i32 0
%2 = getelementptr %struct.anon.1, %struct.anon.1* %1, i32 0, i32 2
%3 = getelementptr %struct.anon.4, %struct.anon.4* %2, i32 0, i32 1
%4 = getelementptr %union.anon.5, %union.anon.5* %3, i32 0, i32 0
%5 = bitcast %struct.anon.7* %4 to %struct.anon.6*
%6 = getelementptr %struct.anon.6, %struct.anon.6* %5, i32 0, i32 1
;%7 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.1", i32 0, i32 0), [256 x i8]* %6)
%7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.1", i32 0, i32 0), [256 x i8]* %6)
%8 = getelementptr %struct.Config, %struct.Config* %config, i32 0, i32 0
%9 = getelementptr %struct.anon.1, %struct.anon.1* %8, i32 0, i32 0
%10 = getelementptr %struct.anon.2, %struct.anon.2* %9, i32 0, i32 1
%11 = getelementptr %union.anon, %union.anon* %10, i32 0, i32 0
%12 = getelementptr %struct.anon.3, %struct.anon.3* %11, i32 0, i32 0
;%13 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.2", i32 0, i32 0), [256 x i8]* %12)
%13 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.2", i32 0, i32 0), [256 x i8]* %12)
ret void
}
declare dso_local i32 @printf(i8*, ...)
; definition {} -> {}
define dso_local void @"$main"() {
entry:
%config = alloca %struct.Config, align 8
%0 = call i32 (i8*, %struct.Config*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, %struct.Config*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string", i64 0, i64 0), %struct.Config* nonnull %config)
%1 = getelementptr inbounds %struct.Config, %struct.Config* %config, i64 0, i32 0, i32 2, i32 1, i32 0, i32 1
%2 = bitcast %struct.anon.8* %1 to [256 x i8]*
%3 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.1", i64 0, i64 0), [256 x i8]* nonnull %2)
%4 = getelementptr inbounds %struct.Config, %struct.Config* %config, i64 0, i32 0, i32 0, i32 1, i32 0, i32 0
%5 = call i32 (i8*, [256 x i8]*, ...) bitcast (i32 (i8*, ...)* @printf to i32 (i8*, [256 x i8]*, ...)*)(i8* getelementptr inbounds ([26 x i8], [26 x i8]* @"$string.2", i64 0, i64 0), [256 x i8]* nonnull %4)
ret void
}
Result:
Config: 0x16f8c6678
xBCLeftInflowProfile: 0x16f8c68b0
xBCLeftHeat: 0x16f8c667c
Not sure why it's treating printf
as a i32 (i8*, %struct.Config*, ...)
/i32 (i8*, [256 x i8]*, ...)
function, however. Is this due to Terra's glue generating incorrect types for functions with varargs somehow?
I'd guess that the Linux test failures, which this issue was initially opened for, are unrelated to this, so maybe this belongs better in a separate issue?
Wow, thanks for doing all this digging.
Yes, I think this is a separate issue: why is Terra thinking that printf
is a i32 (i8*, T*, ...)
(where T
is the second argument) instead of i32 (i8*, ...)
. This could be in Terra's code generation, or it could be somewhere in the glue Terra uses to extract these definitions from Clang.
Just one more note: the place to look for issues is likely going to be EmitCall
, or the Classify
call that it immediately does. This is a bit tricky because it's not just a matter of taking the function as-is and calling it. (E.g., structs have to be decomposed into scalar/vector/array arguments in a way that's architecture-specific.) So that's probably why there's an opportunity for Terra to mess this up.
Hopefully it's just a matter of teaching Terra not to add arguments beyond those in the original function to the formal type it returns, but this may potentially go deeper into the compiler plumbing.
I think trying to track this down is beyond me at the moment, I'm not familiar with Terra's (or LLVM's) compilation process, but might be able to take a look at somepoint in the future. Probably worth opening a separate issue for it.
@shoe42 I have a potential fix for the varargs issue at #602. Can you please check if this works for you, and if so what do the test suite results look like after this?
Coming back to the issue of Linux AArch64, I see that the following tests are still failing:
98% tests passed, 9 tests failed out of 549
Total Test time (real) = 32.29 sec
The following tests FAILED:
121 - defercond.t (SEGFAULT)
209 - let2.t (SEGFAULT)
343 - simpleglobal.t (Failed)
396 - teststd.t (SEGFAULT)
412 - vars.t (SEGFAULT)
413 - vars2.t (SEGFAULT)
415 - vecarith.t (SEGFAULT)
424 - zeroreturn.t (SEGFAULT)
425 - zeroreturn2.t (SEGFAULT)
I've seen this pop up a couple of times, so I'm documenting it here so people know about it. On ARM (AArch64), LLVM versions >= 12 seem to experience higher test failure rates. I've seen this happen on both Linux (Graviton, NVIDIA Jetson) ~and macOS (Apple M1)~, so it seems to be a feature of ARM processors ~and not of a specific OS~.
Test pass rate on LLVM 14.0.0:
Test pass rate on LLVM 11.1.0:
This is with dcd2eff87224d878c6adb33f1b69453ccc0a4ddf on the following machine:
My best guess at a root cause is #485, because LLVM 12 is the version where we switched back to MCJIT after ORCv1 was removed. However, it's possible something else changed in LLVM that is breaking something, and we haven't accounted for it yet.
Right now the best workaround is to stick to LLVM 11 on ARM platforms.