llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.13k stars 11.62k forks source link

[ORC] Missing definition for inlining function after optimizations in IRTRansformLayer #64907

Open lucasreis1 opened 1 year ago

lucasreis1 commented 1 year ago

Faced this issue while dealing with an optimization pipeline that focuses on inlining.

If a function has the always_inline attribute and linkonce_odr linkage, it is registered as a symbol for materialization by the MU. After optimization, passes discard the definition in IR and no symbol is generated. The symbol request is then left dangling by the MU.

Quick module to reproduce:

; ModuleID = 'main.cpp'
source_filename = "main.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

$_Z7testFoov = comdat any

@.str = private unnamed_addr constant [8 x i8] c"hello!\0A\00", align 1

; Function Attrs: mustprogress norecurse uwtable
define dso_local noundef i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, ptr %1, align 4
  call void @_Z7testFoov()
  ret i32 0
}

; Function Attrs: alwaysinline mustprogress uwtable
define linkonce_odr dso_local void @_Z7testFoov() #1 comdat {
  %1 = call i32 (ptr, ...) @printf(ptr noundef @.str)
  ret void
}

declare i32 @printf(ptr noundef, ...) #2

attributes #0 = { mustprogress norecurse uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { alwaysinline mustprogress uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{!"clang version 16.0.6 (https://github.com/llvm/llvm-project 7cbf1a2591520c2491aa35339f227775f4d3adf6)"}

I have not done in-depth testing, but I'm pretty sure any standard optimization pipeline done by the new pass manager would trigger this, as the function is marked as always_inline.

llvmbot commented 1 year ago

@llvm/issue-subscribers-orcjit

lucasreis1 commented 1 year ago

I've been doing more digging and was able to replicate this with different functions that do not include the always_inline attr. It seems that any function definition that the inlining pass removes after inilning will result in this issue. This includes functions with attributes such as alwaysinline, inlinehint, or even with no relevant attributes.

The common denominator here seems to come from the linkage type: linkonce_odr, which is very favorable for inling but still needs to be present in the symbol table from the perspective of the MU. What's the best approach here? Does the IRMU need to change its approach to generating symbols favorable for inlining, or should the optimization callee be responsible for removing definitions from the MU after inlining?

lhames commented 8 months ago

It's not usually safe to remove symbols from the interface of a MaterializationUnit. They can be omitted when the MU is created, but then they can't be searched for directly, which is a problem if anyone wants to look them up.

I think we should probably change linkonce_odr to weak_odr when the module is added to the JIT. This could result in extra memory consumption when linkonce_odrs that had been optimized away are emitted, but that's probably better than making them invisible.

lhames commented 8 months ago

Actually after talking to @ributzka I think the better option might be to leave linkonce_odr symbols out of the interface, since that's how TextAPI would treat them by default.

ORC has a provision for weak symbols that "show up late", e.g. weak definitions of constant pool entries produced during codegen -- we could use this to model linkonce_odr definitions that don't get fully inlined away.