Open andrewrk opened 3 years ago
The memcpy's semantics is a bit unclear w.r.t padding, and I think it can be clarified by checking whether !tbaa.struct is attached.
If memcpy has the metadata attached, it considers the padding. If memcpy doesn't have the metadata, it is just a bytewise copy.
With this model, SROA should have checked whether !tbaa.struct flag existed or not.
A nice thing is that clang frontend is already attaching !tbaa.struct to memcpy: https://godbolt.org/z/KqqTcj So fixing SROA won't cause performance degradation.
Let me take a step back as the story is a bit more complicated. SROA tries to be smart in recovering the type information in the structures, but ends up missing bits that are memcpy'ed. This data is memcpy'ed from padding bits, but we don't have any rule saying it's illegal to memcpy to/from padding (we do have for load/stores, though). We may want to patch LangRef instead of fixing SROA?
Reduced test case:
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu"
%CallArg = type { %"[]u8" } %"[]u8" = type { i8*, i64 } %ExpressionResult = type { i64, i1 }
@0 = internal unnamed_addr constant [20 x i8] c"what happened to me\00", align 1 @1 = internal unnamed_addr constant %CallArg { %"[]u8" { i8 getelementptr inbounds ([20 x i8], [20 x i8] @0, i64 0, i64 0), i64 19 } }, align 8
declare void @llvm.memcpy.p0i8.p0i8.i64(i8 noalias nocapture writeonly, i8 noalias nocapture readonly, i64, i1 immarg) #0
define void @main() { %v.i = alloca %"[]u8", align 8 %w.i = alloca %"[]u8", align 8 %result = alloca %ExpressionResult, align 8 %derp = alloca %ExpressionResult, align 8
; copy @1 -> %v.1 %x3 = getelementptr inbounds %CallArg, %CallArg @1, i32 0, i32 0 %x7 = bitcast %"[]u8" %x3 to i8 %x8 = bitcast %"[]u8" %v.i to i8 call void @llvm.memcpy.p0i8.p0i8.i64(i8 %x8, i8* %x7, i64 16, i1 false)
; copy %v.1 -> %result %x14 = bitcast %ExpressionResult %result to i8 call void @llvm.memcpy.p0i8.p0i8.i64(i8 %x14, i8 %x8, i64 16, i1 false)
; copy %result -> %w.i %x18 = bitcast %"[]u8" %w.i to i8 call void @llvm.memcpy.p0i8.p0i8.i64(i8 %x18, i8 %x14, i64 16, i1 false)
; output %w.i call void @std.os.write(i32 1, %"[]u8"* %w.i) ret void
SwitchProng1.i3: ; No predecessors! %x35 = getelementptr inbounds %ExpressionResult, %ExpressionResult %derp, i32 0, i32 1 store i1 false, i1 %x35, align 1 ret void
SwitchProng2.i4: ; No predecessors! %x37 = bitcast %ExpressionResult %result to i8 %x38 = bitcast %ExpressionResult %derp to i8 call void @llvm.memcpy.p0i8.p0i8.i64(i8 %x38, i8 %x37, i64 24, i1 false) ret void }
declare void @std.os.write(i32, %"[]u8"*)
Not a bug; test case triggers UB. See github report for an analysis.
Extended Description
Hello, I am reporting this on behalf of LemonBoy from a downstream issue report:
https://github.com/ziglang/zig/issues/7325
The different behaviour can be observed by running:
opt-11 -sroa <.ll file> | lli-11 # The output is corrupted lli-11 <.ll file> # The output is fine