llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.26k stars 11.67k forks source link

SROA causing invalid code generation #48068

Open andrewrk opened 3 years ago

andrewrk commented 3 years ago
Bugzilla Link 48724
Version trunk
OS All
Attachments test.ll
CC @aqjune,@nunoplopes

Extended Description

Hello, I am reporting this on behalf of LemonBoy from a downstream issue report:

https://github.com/ziglang/zig/issues/7325

The different behaviour can be observed by running:

opt-11 -sroa <.ll file> | lli-11 # The output is corrupted lli-11 <.ll file> # The output is fine

aqjune commented 3 years ago

The memcpy's semantics is a bit unclear w.r.t padding, and I think it can be clarified by checking whether !tbaa.struct is attached.

If memcpy has the metadata attached, it considers the padding. If memcpy doesn't have the metadata, it is just a bytewise copy.

With this model, SROA should have checked whether !tbaa.struct flag existed or not.

A nice thing is that clang frontend is already attaching !tbaa.struct to memcpy: https://godbolt.org/z/KqqTcj So fixing SROA won't cause performance degradation.

nunoplopes commented 3 years ago

Let me take a step back as the story is a bit more complicated. SROA tries to be smart in recovering the type information in the structures, but ends up missing bits that are memcpy'ed. This data is memcpy'ed from padding bits, but we don't have any rule saying it's illegal to memcpy to/from padding (we do have for load/stores, though). We may want to patch LangRef instead of fixing SROA?

Reduced test case:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu"

%CallArg = type { %"[]u8" } %"[]u8" = type { i8*, i64 } %ExpressionResult = type { i64, i1 }

@​0 = internal unnamed_addr constant [20 x i8] c"what happened to me\00", align 1 @​1 = internal unnamed_addr constant %CallArg { %"[]u8" { i8 getelementptr inbounds ([20 x i8], [20 x i8] @​0, i64 0, i64 0), i64 19 } }, align 8

declare void @​llvm.memcpy.p0i8.p0i8.i64(i8 noalias nocapture writeonly, i8 noalias nocapture readonly, i64, i1 immarg) #​0

define void @​main() { %v.i = alloca %"[]u8", align 8 %w.i = alloca %"[]u8", align 8 %result = alloca %ExpressionResult, align 8 %derp = alloca %ExpressionResult, align 8

; copy @​1 -> %v.1 %x3 = getelementptr inbounds %CallArg, %CallArg @​1, i32 0, i32 0 %x7 = bitcast %"[]u8" %x3 to i8 %x8 = bitcast %"[]u8" %v.i to i8 call void @​llvm.memcpy.p0i8.p0i8.i64(i8 %x8, i8* %x7, i64 16, i1 false)

; copy %v.1 -> %result %x14 = bitcast %ExpressionResult %result to i8 call void @​llvm.memcpy.p0i8.p0i8.i64(i8 %x14, i8 %x8, i64 16, i1 false)

; copy %result -> %w.i %x18 = bitcast %"[]u8" %w.i to i8 call void @​llvm.memcpy.p0i8.p0i8.i64(i8 %x18, i8 %x14, i64 16, i1 false)

; output %w.i call void @​std.os.write(i32 1, %"[]u8"* %w.i) ret void

SwitchProng1.i3: ; No predecessors! %x35 = getelementptr inbounds %ExpressionResult, %ExpressionResult %derp, i32 0, i32 1 store i1 false, i1 %x35, align 1 ret void

SwitchProng2.i4: ; No predecessors! %x37 = bitcast %ExpressionResult %result to i8 %x38 = bitcast %ExpressionResult %derp to i8 call void @​llvm.memcpy.p0i8.p0i8.i64(i8 %x38, i8 %x37, i64 24, i1 false) ret void }

declare void @​std.os.write(i32, %"[]u8"*)

nunoplopes commented 3 years ago

Not a bug; test case triggers UB. See github report for an analysis.