llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.25k stars 11.67k forks source link

Worse zero struct init with = T{} than with memset #46412

Open davidbolvansky opened 4 years ago

davidbolvansky commented 4 years ago
Bugzilla Link 47068
Version trunk
OS Linux
CC @efriedma-quic,@zygoloid,@rotateright

Extended Description

class pt { int x; int y; };

class pt2 { int x; char y; };

void foo(pt s) { s = {}; }

void bar(pt2 s) { s = {}; }

For foo case, codegen is fine. The padding is problematic here.

Clang: ret bar(pt2*): mov DWORD PTR [rdi], 0 mov BYTE PTR [rdi+4], 0 ret

ICC:

bar(pt2*): xor eax, eax #​18.4 mov QWORD PTR [rdi], rax #​18.8 ret

So ideally we should have: mov QWORD PTR [rdi], 0

define dso_local void @​_Z3fooP2pt(%class.pt nocapture %0) local_unnamed_addr #​0 { %2 = bitcast %class.pt %0 to i64 store i64 0, i64 %2, align 4 ret void }

define dso_local void @​_Z3barP3pt2(%class.pt2 nocapture %0) local_unnamed_addr #​1 { %2 = bitcast %class.pt2 %0 to i40 store i40 0, i40 %2, align 4, !tbaa.struct !​2 ret void }

Looking at dumps, SROA to blame?


With class pt3 { int x; int y; char z; };

Unoptimized: define dso_local void @​_Z3bazP3pt3(%class.pt3 %0) #​0 { %2 = alloca %class.pt3, align 8 %3 = alloca %class.pt3, align 4 store %class.pt3* %0, %class.pt3 %2, align 8 %4 = bitcast %class.pt3 %3 to i8 call void @​llvm.memset.p0i8.i64(i8 align 4 %4, i8 0, i64 12, i1 false) %5 = load %class.pt3, %class.pt3 %2, align 8 %6 = bitcast %class.pt3 %5 to i8 %7 = bitcast %class.pt3 %3 to i8 call void @​llvm.memcpy.p0i8.p0i8.i64(i8 align 4 %6, i8 align 4 %7, i64 9, i1 false) ret void }

Optimized: define dso_local void @​_Z3bazP3pt3(%class.pt3 nocapture %0) local_unnamed_addr #​2 { %2 = bitcast %class.pt3 %0 to i8 call void @​llvm.memset.p0i8.i64(i8 nonnull align 4 dereferenceable(9) %2, i8 0, i64 9, i1 false) ret void }

In this case, we are doing bad job when combing memset and memcpy in MemCpyOptimizer. It should be: call void @​llvm.memset.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %2, i8 0, i64 12, i1 false)

efriedma-quic commented 4 years ago

It isn't obvious to me that this transform is valid, at first glance. pt2 could have subclasses that store data in the padding that's overwritten by icc.