terralang / terra

Terra is a low-level system programming language that is embedded in and meta-programmed by the Lua programming language.
terralang.org
Other
2.71k stars 197 forks source link

setinlined not working in JIT mode on LLVM >= 17 #671

Open norcalli opened 1 month ago

norcalli commented 1 month ago

terra-Linux-x86_64-094c5ad (1.1.1)

❯ ~/works/3rd/terra-Linux-x86_64-094c5ad/bin/terra ./tests/ainline.t
definition      {} -> int32
define dso_local i32 @"$bar"() {
entry:
  %puts.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str, i64 0, i64 0))
  %puts1.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.1, i64 0, i64 0))
  %puts2.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.2, i64 0, i64 0))
  %puts3.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.3, i64 0, i64 0))
  %puts4.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.4, i64 0, i64 0))
  %puts5.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.5, i64 0, i64 0))
  %puts6.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.6, i64 0, i64 0))
  %puts7.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.7, i64 0, i64 0))
  %puts8.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.8, i64 0, i64 0))
  %puts9.i = tail call i32 @puts(i8* nonnull dereferenceable(1) getelementptr inbounds ([13 x i8], [13 x i8]* @str.9, i64 0, i64 0))
  ret i32 4
}
assembly for function at address 0x72665b91d000
0x72665b91d000(+0):             push    rbx
0x72665b91d001(+1):             movabs  rdi, 125783948509184
0x72665b91d00b(+11):            movabs  rbx, 125783943755728
0x72665b91d015(+21):            call    rbx
0x72665b91d017(+23):            movabs  rdi, 125783948509197
0x72665b91d021(+33):            call    rbx
0x72665b91d023(+35):            movabs  rdi, 125783948509210
0x72665b91d02d(+45):            call    rbx
0x72665b91d02f(+47):            movabs  rdi, 125783948509223
0x72665b91d039(+57):            call    rbx
0x72665b91d03b(+59):            movabs  rdi, 125783948509236
0x72665b91d045(+69):            call    rbx
0x72665b91d047(+71):            movabs  rdi, 125783948509249
0x72665b91d051(+81):            call    rbx
0x72665b91d053(+83):            movabs  rdi, 125783948509262
0x72665b91d05d(+93):            call    rbx
0x72665b91d05f(+95):            movabs  rdi, 125783948509275
0x72665b91d069(+105):           call    rbx
0x72665b91d06b(+107):           movabs  rdi, 125783948509288
0x72665b91d075(+117):           call    rbx
0x72665b91d077(+119):           movabs  rdi, 125783948509301
0x72665b91d081(+129):           call    rbx
0x72665b91d083(+131):           mov     eax, 4
0x72665b91d088(+136):           pop     rbx
0x72665b91d089(+137):           ret

terra-Linux-x86_64-094c5ad (1.2.0)

❯ ~/works/3rd/terra-Linux-x86_64-cc543db/bin/terra ./tests/ainline.t
definition      {} -> int32
define dso_local i32 @"$bar"() {
entry:
  tail call void @"$foo"()
  ret i32 4
}
assembly for function at address 0x7781c8e9f000
0x7781c8e9f000(+0):             push    rax
0x7781c8e9f001(+1):             movabs  rax, 131399305261088
0x7781c8e9f00b(+11):            call    rax
0x7781c8e9f00d(+13):            mov     eax, 4
0x7781c8e9f012(+18):            pop     rcx
0x7781c8e9f013(+19):            ret

I noticed this from the disassembly on one of my personal projects, but it seems like inlining doesn't work and I checked against the existing test and that seems to be the case.

norcalli commented 1 month ago

If I find the time, I can try to bisect it but maybe it's just a result of the LLVM change

elliottslaughter commented 1 month ago

Yeah, I was wondering if this was going to come bite us.

LLVM 17 completely removed the old optimization pipeline. Therefore, I had to make a hard switch over with that LLVM version to the new optimization pipeline. In the process, I had to remove the manual inliner. Maybe it's possible to adapt, but to be honest the new optimization pipeline is pretty undocumented and even the stuff I've tried to do so far has been pretty inscrutable.

There is some good news: this impacts JIT mode only. If you use terralib.saveobj you'll see the function get inlined as expected.

Add the following to the bottom of tests/ainline.t:

print(terralib.saveobj(nil, "llvmir", {bar=bar}))

Then you'll see it print out:

Output of running Terra on LLVM 18 with AOT mode ``` ; ModuleID = 'terra' source_filename = "terra" target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-darwin23.5.0" @str.9 = private unnamed_addr constant [13 x i8] c"hello, world\00", align 1 ; Function Attrs: nofree nounwind define dso_local noundef i32 @bar() local_unnamed_addr #0 { entry: %puts.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts1.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts2.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts3.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts4.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts5.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts6.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts7.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts8.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) %puts9.i = tail call i32 @puts(ptr nonnull dereferenceable(1) @str.9) ret i32 4 } ; Function Attrs: nofree nounwind declare noundef i32 @puts(ptr nocapture noundef readonly) local_unnamed_addr #0 attributes #0 = { nofree nounwind } ```

So then there are two possible workarounds:

norcalli commented 1 month ago

Ah I see, that makes sense. I think I can live with that workaround, although probably prudent to print out a warning or something for setinlined until the JIT pipeline supports it.

Tangentially but also related, I was wondering if there was an easy way to load/link against .o files directly without first turning them into an archive or library directly from terra. Seems like it might be more relevant now with saveobj producing different code.

Although for this case, I can produce llvm bitcode and link that back in, which works (but has the disadvantage of linking into the global namespace?) ala

local exports = {main=main}
local O = terralib.linkllvmstring(terralib.saveobj(nil, "bitcode", exports))
for k, v in pairs(exports) do
  print(O:extern(k, v.type))
end
elliottslaughter commented 1 month ago

I have produced a set of Linux binaries for 1.2.0 with LLVM 16, and attached them to the release with a note about this as a known issue. It looks ok on my end, but please check to see if this works for you:

https://github.com/terralang/terra/releases/download/release-1.2.0/terra-Linux-x86_64-cc543db-llvm16.tar.xz

The call terralib.linklibrary ultimately decomposes into llvm::sys::DynamicLibrary::LoadLibraryPermanently. Based on the documentation I would guess that it only works on shared objects, but you're welcome to test it out:

https://github.com/terralang/terra/blob/cc543dbcc85dbda84d5aec624d80f76642566940/src/tcompiler.cpp#L3991

https://llvm.org/doxygen/classllvm_1_1sys_1_1DynamicLibrary.html#a53d32d3b3baefdec31d3d94b0586d437

norcalli commented 4 weeks ago

yup it looks like it works! thank you! I did briefly look into the pass system myself to see if I could figure it out but I gave up haha. llvm docs are as inscrutable as always. I tried reading the Rust crate docs too just to see if I could make a JIT engine with inlining.