terralang / terra

Terra is a low-level system programming language that is embedded in and meta-programmed by the Lua programming language.
terralang.org
Other
2.72k stars 201 forks source link

'Illegal Instruction' when requiring terra built dynamic library in terra and lua5.1 #448

Closed jmacc93 closed 2 years ago

jmacc93 commented 4 years ago

Feel free to remove this if its a bad issue or I'm being an idiot but here is what I found:

The following code crashes and burns:

local C = terralib.includecstring [[ 
  #include <stdio.h> 
  #include <lua5.1/lua.h>
]]

-- lua.h: #define LUA_GLOBALSINDEX (-10002)
local LUA_GLOBALSINDEX = -10002

terra ret1(l: &C.lua_State) : int32
  C.lua_pushnumber(l, 1.0) -- doesn't work
  --C.lua_pushinteger(l, 1) -- works
  return 1
end

terra luaopen_clib(l : &C.lua_State) : int32
  C.lua_pushcclosure(l, ret1, 0)
  C.lua_setfield(l, LUA_GLOBALSINDEX, "ret1")
  return 1;
end

terralib.saveobj("clib.so", {luaopen_clib=luaopen_clib})

require "clib"

print(ret1())

Here is whats printed to stdout

./clib.so(ret1+0x1) [0x7fc09f39a131]
[0x10662]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt13basic_filebufIwSt11char_traitsIwEE22_M_convert_to_externalEPwl+0x112) [0x7fc09f000c02]
[0xc02]
[0xc02]
/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSt13basic_filebufIwSt11char_traitsIwEE22_M_convert_to_externalEPwl+0x112) [0x7fc09f000c02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0xc02]
[0x20c02]
[0xc02]
[0xc02]
[0x10c02]
[0x10c02]
Illegal instruction

And plain lua 5.1 simply returns Illegal Instruction

Also, this same ret1 function works when returning constant strings, and integers. Only doubles crash it for some reason.

I'm on 64 bit debian linux

terra -v gives Release 1.0.0-beta2 which I got from the prebuilt binaries

My intention was to replace some of my lua libraries built in c with ones built in terra. To be specific: I was trying to remake a library in terra that I could call from lua for performance critical sections of my programs.

Great job with Terra, btw. It looks like such a great system and I am really excited about making stuff in it! And again, sorry if I just didn't do something correctly and the solution is obvious.

elliottslaughter commented 4 years ago

Hi there and sorry for the slow reply.

Would you be able to try a build from source? The binaries were built on Ubuntu 16.04, so it's not inconceivable that there's some sort of incompatibility there. We've seen it with other distros, at least.

jmacc93 commented 4 years ago

Slow replies are alright : )

I should have built it from source from the get-go but I was intimidated by the size of clangs package. But anyway I installed clang and built terra and get the same result with the new binary.

I should mention that I'm an absolute amateur and don't really know what I'm doing lol. But, I fed the new terra binary compiling and running the original code I posted through gdb and found it failed (I think) on the second instruction of ret1. I compiled a similar .so in pure c and passed them both into objdump -d and got for the range in question:

clib c ret1
0000000000001125 <lret1>:
    1125:   55                      push   %rbp
    1126:   48 89 e5                mov    %rsp,%rbp
    1129:   48 83 ec 10             sub    $0x10,%rsp
    112d:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
    1131:   f2 0f 10 05 cf 0e 00    movsd  0xecf(%rip),%xmm0        # 2008 <_fini+0xe74>
    1138:   00 
    1139:   48 8b 45 f8             mov    -0x8(%rbp),%rax
    113d:   48 89 c7                mov    %rax,%rdi
    1140:   e8 0b ff ff ff          callq  1050 <lua_pushnumber@plt>
    1145:   b8 01 00 00 00          mov    $0x1,%eax
    114a:   c9                      leaveq 
    114b:   c3                      retq   

clib terra ret1
0000000000001130 <lret1>:
    1130:   50                      push   %rax
    1131:   c5 fb 10 05 c7 0e 00    vmovsd 0xec7(%rip),%xmm0        # 2000 <_fini+0xe80>
    1138:   00 
    1139:   e8 12 ff ff ff          callq  1050 <lua_pushnumber@plt>
    113e:   b8 01 00 00 00          mov    $0x1,%eax
    1143:   59                      pop    %rcx
    1144:   c3                      retq   
    1145:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
    114c:   00 00 00 00 

Again, I don't actually know what I'm doing so I don't really know how to interpret this, or if its even important. It is very aesthetically pleasing, and interesting, though, that the same instructions in both files end up at the same offset. And if I'm interpreting gdb correctly then ret1 is failing on instruction 1131 in the second block of instructions above.

I will try building LLVM as well when I find time and see if that does anything

elliottslaughter commented 4 years ago

What kind of machine are you building on? vmovsd seems to be an AVX instruction so I suppose if you're on an old enough machine it might not be supported.

https://docs.oracle.com/cd/E36784_01/html/E36859/gntbd.html

I'm not quite sure what else to suggest at this point. Hypothetically LLVM should detect what kind of machine you're on regardless of how it's built, but who knows.

elliottslaughter commented 4 years ago

FYI, we had an example of this before with Skylake and LLVM 3.8 where LLVM did not correctly detect the CPU's feature set. As a test you could try building Terra with the flag DISABLE_AVX which should shut off AVX codegen in LLVM:

https://github.com/terralang/terra/commit/c907b75c50536df6ae3458455580aa46f515ec3e

And if that makes a difference then we can try to figure out what combination of processor/LLVM/etc. is causing things to go wrong.

jmacc93 commented 4 years ago

Hey hey! Using DISABLE_AVX worked! No crashes and I am getting the expected result with C.lua_pushnumber(l, 1.0)

Interestingly, using the target {Triple="x86_64-unknown-unknown-unknown"} as well as the expected {Triple="x86_64-intel-linux-elf"} in terralib.saveobj also works when using the original binaries.

I'm not sure if it helps, but here is my lscpu

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Pentium(R) CPU 4417U @ 2.30GHz Stepping: 10 CPU MHz: 900.061 CPU max MHz: 2300.0000 CPU min MHz: 400.0000 BogoMIPS: 4608.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 2048K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid mpx rdseed smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d

elliottslaughter commented 4 years ago

I guess this CPU doesn't have AVX? That would certainly explain why those instructions crash.

https://ark.intel.com/content/www/us/en/ark/products/189269/intel-pentium-gold-processor-4417u-2m-cache-2-30-ghz.html

By any chance have you tried newer LLVM versions? We support as recent as LLVM 9. It may be that the CPU auto-detection works better in those versions.

Otherwise we can add a workaround but I'd prefer to not do this if we can avoid it.

elliottslaughter commented 2 years ago

I'm going to close this due to lack of recent replies, but before I do I just want to note that we now support up to LLVM 14 (default to 13 in the binaries), have upgraded LuaJIT (which has received substantial fixes), and have also put quite a bit of work into portability. If the issue you hit wasn't already fixed, odds are good it's gone at this point.

At any rate, feel free to reopen if I'm wrong about that.