terralang / terra

Terra is a low-level system programming language that is embedded in and meta-programmed by the Lua programming language.
terralang.org
Other
2.71k stars 197 forks source link

Miscompilation with LLVM > 6 #630

Closed elliottslaughter closed 1 year ago

elliottslaughter commented 1 year ago

This reproducer was minimized from a Regent program discussed in https://github.com/StanfordLegion/legion/issues/1385.

https://gist.github.com/elliottslaughter/c8bc2252c92639f8d3618b9bf3d33fb2

Running with Terra compiled with any LLVM version > 6 produces (in this case, with LLVM 13):

$ terra test3_min_terra.t
extern global gamma_table : double[700][18]
0   terra (JIT)                         0x000000010ecd8383 $g3d + 851

Running with Terra compiled with LLVM 6 produces:

$ terra-llvm6 seema_test3_min_terra.t
extern global gamma_table : double[700][18]

The issue seems to have something to do with the array we're bringing in via the header file.

I tested macOS x86_64. I believe the issue also occurs on Linux x86_64.

elliottslaughter commented 1 year ago

Adding getGammaTableArrayBig:setinlined(false) produces a slightly different error (presumably just moving the same error to a different site):

$ ./regent.py test3_min_terra.t
extern global gamma_table : double[700][18]
0   terra (JIT)                         0x000000010fc74271 $getGammaTableArrayBig + 33 
1   terra                               0x000000010aa72a45 lj_ccall_func + 2101
2   ???                                 0x000000000000001b 0x0 + 27
LLVM 13 disas (optimized) ``` extern global gamma_table : double[700][18] definition {int32,int32} -> double ; Function Attrs: noinline define dso_local double @"$getGammaTableArrayBig"(i32 %0, i32 %1) #0 { entry: %2 = sext i32 %0 to i64 %3 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* getelementptr inbounds (<{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>, <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>* @gamma_table, i64 0, i32 0), i64 %2, i32 0 %4 = sext i32 %1 to i64 %gval.0..sroa_idx = getelementptr [680 x double], [680 x double]* %3, i64 %4, i64 0 %gval.0.copyload = load double, double* %gval.0..sroa_idx, align 16 ret double %gval.0.copyload } assembly for function at address 0x1133df000 0x1133df000(+0): movsxd rax, edi 0x1133df003(+3): imul rax, rax, 5600 0x1133df00a(+10): movsxd rcx, esi 0x1133df00d(+13): imul rcx, rcx, 5440 0x1133df014(+20): add rcx, rax 0x1133df017(+23): movabs rax, 4617797632 0x1133df021(+33): vmovsd xmm0, qword ptr [rax + rcx] 0x1133df026(+38): ret 0 terra (JIT) 0x00000001133df021 $getGammaTableArrayBig + 33 1 terra 0x000000010e1dda45 lj_ccall_func + 2101 2 ??? 0x000000000000001b 0x0 + 27 ```
LLVM 13 disas (unoptimized) ``` extern global gamma_table : double[700][18] definition {int32,int32} -> double ; Function Attrs: noinline optnone define dso_local double @"$getGammaTableArrayBig"(i32 %0, i32 %1) #0 { entry: %gval = alloca double, align 8 %index = alloca i32, align 4 %j = alloca i32, align 4 store i32 %0, i32* %j, align 4 store i32 %1, i32* %index, align 4 %2 = load i32, i32* %j, align 4 %3 = sext i32 %2 to i64 %4 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* getelementptr inbounds (<{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>, <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }>* @gamma_table, i32 0, i32 0), i64 %3 %5 = getelementptr <{ [680 x double], [20 x double] }>, <{ [680 x double], [20 x double] }>* %4, i32 0, i32 0 %6 = load i32, i32* %index, align 4 %7 = sext i32 %6 to i64 %8 = getelementptr [680 x double], [680 x double]* %5, i64 %7 %9 = load [680 x double], [680 x double]* %8, align 8 %10 = bitcast double* %gval to i8* %11 = bitcast [680 x double]* %8 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %10, i8* align 8 %11, i64 5440, i1 false) %12 = load double, double* %gval, align 8 ret double %12 dead: ; No predecessors! ret double undef } assembly for function at address 0x10baad000 0x10baad000(+0): sub rsp, 24 0x10baad004(+4): mov dword ptr [rsp + 8], edi 0x10baad008(+8): mov dword ptr [rsp + 12], esi 0x10baad00c(+12): movsxd rax, dword ptr [rsp + 8] 0x10baad011(+17): imul rsi, rax, 5600 0x10baad018(+24): movabs rax, 4490715136 0x10baad022(+34): add rsi, rax 0x10baad025(+37): movsxd rax, dword ptr [rsp + 12] 0x10baad02a(+42): imul rax, rax, 5440 0x10baad031(+49): add rsi, rax 0x10baad034(+52): lea rdi, [rsp + 16] 0x10baad039(+57): movabs rax, 140703517103482 0x10baad043(+67): mov edx, 5440 0x10baad048(+72): call rax 0x10baad04a(+74): vmovsd xmm0, qword ptr [rsp + 16] 0x10baad050(+80): add rsp, 24 0x10baad054(+84): ret 0 libsystem_platform.dylib 0x00007ff817276b66 _platform_memmove$VARIANT$Haswell + 422 1 terra (JIT) 0x000000010baad04a $getGammaTableArrayBig + 74 2 ??? 0x3feffffafe1bfd20 0x0 + 4607182397293460768 ```
LLVM 6 disas (optimized) ``` extern global gamma_table : double[700][18] definition {int32,int32} -> double ; Function Attrs: noinline define double @"$getGammaTableArrayBig"(i32, i32) #0 { entry: %2 = sext i32 %0 to i64 %3 = sext i32 %1 to i64 %4 = getelementptr [18 x [700 x double]], [18 x [700 x double]]* @gamma_table, i64 0, i64 %2, i64 %3 %5 = load double, double* %4, align 8 ret double %5 } assembly for function at address 0x1093a3000 0x1093a3000(+0): movsxd rax, edi 0x1093a3003(+3): movabs rdx, 4449779712 0x1093a300d(+13): movsxd rcx, esi 0x1093a3010(+16): imul rax, rax, 5600 0x1093a3017(+23): lea rdx, [rdx + rax] 0x1093a301b(+27): vmovsd xmm0, qword ptr [rdx + 8*rcx] 0x1093a3020(+32): ret ```
LLVM 6 disas (unoptimized) ``` extern global gamma_table : double[700][18] definition {int32,int32} -> double ; Function Attrs: noinline optnone define double @"$getGammaTableArrayBig"(i32, i32) #0 { entry: %gval = alloca double %index = alloca i32 %j = alloca i32 store i32 %0, i32* %j store i32 %1, i32* %index %2 = load i32, i32* %j %3 = sext i32 %2 to i64 %4 = getelementptr [700 x double], [700 x double]* getelementptr inbounds ([18 x [700 x double]], [18 x [700 x double]]* @gamma_table, i32 0, i32 0), i64 %3 %5 = getelementptr [700 x double], [700 x double]* %4, i32 0, i32 0 %6 = load i32, i32* %index %7 = sext i32 %6 to i64 %8 = getelementptr double, double* %5, i64 %7 %9 = load double, double* %8 store double %9, double* %gval %10 = load double, double* %gval ret double %10 dead: ; No predecessors! ret double undef } assembly for function at address 0x1096f1000 0x1096f1000(+0): mov dword ptr [rsp - 16], edi 0x1096f1004(+4): mov dword ptr [rsp - 12], esi 0x1096f1008(+8): movsxd rax, dword ptr [rsp - 16] 0x1096f100d(+13): imul rax, rax, 5600 0x1096f1014(+20): movabs rcx, 4453244928 0x1096f101e(+30): add rcx, rax 0x1096f1021(+33): movsxd rax, dword ptr [rsp - 12] 0x1096f1026(+38): vmovsd xmm0, qword ptr [rcx + 8*rax] 0x1096f102b(+43): vmovsd qword ptr [rsp - 8], xmm0 0x1096f1031(+49): vmovsd xmm0, qword ptr [rsp - 8] 0x1096f1037(+55): ret ```
For posterity, the version of the reproducer used in this test ```terra local c = terralib.includec("stdlib.h") local gamma_header_big = terralib.includec("gamma_table_min.h", {"-I", "./"}) print(gamma_header_big.gamma_table) terra getGammaTableArrayBig(j : int, index : int) : double var gval: double gval = gamma_header_big.gamma_table[j][index] return gval end getGammaTableArrayBig:setinlined(false) getGammaTableArrayBig:setoptimized(false) getGammaTableArrayBig:disas() terra g3d(r_gamma_table : &double, N : int, M : int) for x = 0, N do for y = 0, M do r_gamma_table[x * M + y] = getGammaTableArrayBig(x, y) end end end terra main() var N = 18 var M = 700 var r_gamma_table_big = [&double](c.malloc(N*M*terralib.sizeof(double))) g3d(r_gamma_table_big, N, M) end main() ```

Edit: I also ran LLVM 7, and the results look very similar to LLVM 13.

Edit 2: all of this on macOS x86_64.

elliottslaughter commented 1 year ago

A couple places in the code possibly worth checking:

https://github.com/terralang/terra/blob/7ad77561ac470e5112266b7ac4fd9c57870c459b/src/tcompiler.cpp#L1295-L1299

https://github.com/terralang/terra/blob/7ad77561ac470e5112266b7ac4fd9c57870c459b/src/tcompiler.cpp#L2526-L2530

elliottslaughter commented 1 year ago

Confirmed that LLVM 7 is going through this code path:

https://github.com/terralang/terra/blob/7ad77561ac470e5112266b7ac4fd9c57870c459b/src/tcompiler.cpp#L2528-L2530

elliottslaughter commented 1 year ago

The root cause appears to be that starting in LLVM 7, Clang changed the way that it encodes certain global variables. In particular, arrays with trailing zeros appear to be reencoded as a struct containing two sub-arrays, one with the non-zeros and one with zeroes.

This is pretty clear if you dump the full LLVM IR for the complete examples:

main-llvm6.ll:

@gamma_table = local_unnamed_addr constant [18 x [700 x double]] ...

main-llvm7.ll:

@gamma_table = local_unnamed_addr constant <{ <{ [680 x double], [20 x double] }>, <{ [658 x double], [42 x double] }>, <{ [639 x double], [61 x double] }>, <{ [624 x double], [76 x double] }>, <{ [610 x double], [90 x double] }>, <{ [598 x double], [102 x double] }>, <{ [587 x double], [113 x double] }>, <{ [577 x double], [123 x double] }>, <{ [569 x double], [131 x double] }>, <{ [560 x double], [140 x double] }>, <{ [553 x double], [147 x double] }>, <{ [546 x double], [154 x double] }>, <{ [540 x double], [160 x double] }>, <{ [534 x double], [166 x double] }>, <{ [528 x double], [172 x double] }>, <{ [523 x double], [177 x double] }>, <{ [518 x double], [182 x double] }>, [700 x double] }> <{ <{ [680 x double], [20 x double] }> <{ [680 x double] ...

So then there's a question of what to do about it. Some options:

  1. Teach Terra to emit array access expressions for the above types. This seems like a bad idea, particularly because we're not in control of what Clang emits here.
  2. Cast the global to the right type before accessing. So far, all my attempts to do this fail internal asserts in LLVM.
  3. Copy the global, either into another global of the correct type, or an alloca. I'm not a fan of the alloca approach, but it may end up being easiest. It's not immediately obvious to me how to re-type the global if casting doesn't work either.
elliottslaughter commented 1 year ago

Nevermind, I was just doing the cast wrong. Fix in https://github.com/terralang/terra/pull/631.