llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.4k stars 12.15k forks source link

Improve DSE with libcalls #46988

Open davidbolvansky opened 4 years ago

davidbolvansky commented 4 years ago
Bugzilla Link 47644
Version trunk
OS Windows NT
CC @fhahn

Extended Description

void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }

void g(const char *s) { char a[256]; __builtin_memcpy(a, s, sizeof a); // dead store }

char d[256];

void h(const char *s) {

__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }

Clang: f(char const): # @​f(char const) sub rsp, 264 mov rsi, rdi mov rdi, rsp call strcpy add rsp, 264 ret g(char const): # @​g(char const) ret h(char const): # @​h(char const) push rax mov rsi, rdi mov edi, offset d call strcpy xorps xmm0, xmm0 movaps xmmword ptr [rip + d], xmm0 movaps xmmword ptr [rip + d+16], xmm0 movaps xmmword ptr [rip + d+32], xmm0 movaps xmmword ptr [rip + d+48], xmm0 movaps xmmword ptr [rip + d+64], xmm0 movaps xmmword ptr [rip + d+80], xmm0 movaps xmmword ptr [rip + d+96], xmm0 movaps xmmword ptr [rip + d+112], xmm0 movaps xmmword ptr [rip + d+128], xmm0 movaps xmmword ptr [rip + d+144], xmm0 movaps xmmword ptr [rip + d+160], xmm0 movaps xmmword ptr [rip + d+176], xmm0 movaps xmmword ptr [rip + d+192], xmm0 movaps xmmword ptr [rip + d+208], xmm0 movaps xmmword ptr [rip + d+224], xmm0 movaps xmmword ptr [rip + d+240], xmm0 pop rax ret d: .zero 256

We should be able eliminate strcpy calls (based on deref. bytes info, memset overwrites the whole buffer)

fhahn commented 3 years ago

char d[256];

void h(const char *s, int n) {

__builtin_memmove(d, s, n); // dead store __builtin_memset(d, 0, n); }

The case above can be handled with the same approach as https://reviews.llvm.org/rG8d9b9c0edceb

It just needs adding a case for memmove.

fhahn commented 4 years ago

Ok, simple tests are optimized now.

This one remains:

char d[256];

void h(const char *s) {

__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }

Hot sure how this should be done.

It would be interesting also to handle non-CST size.

Probably MemoryLocation's size (https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Analysis/MemoryLocation.h#L66) would need to support arbitrary size values and at least BasicAA and some utilities in DSE need to be taught about that.

Handling the case where the sizes are the same exact value should probably be relatively straight forward. Others, like mixing dynamic and constant sizes will likely be trickier. But that should be something that could be done incrementally.

davidbolvansky commented 4 years ago

Ok, simple tests are optimized now.

This one remains:

char d[256];

void h(const char *s) {

__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }

Hot sure how this should be done.

It would be interesting also to handle non-CST size.

char d[256];

void h(const char *s, int n) {

__builtin_memmove(d, s, n); // dead store __builtin_memset(d, 0, n); }

fhahn commented 4 years ago

Fixing BuildLibCalls to properly annotate strcpy fixed this case (patch coming soon): void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }

But it did not help for: char d[256]; void h(const char *s) {

__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }

Possibly a case of missing handling in MemoryLocation?

davidbolvansky commented 4 years ago

https://reviews.llvm.org/D88328

davidbolvansky commented 4 years ago

Fixing BuildLibCalls to properly annotate strcpy fixed this case (patch coming soon): void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }

But it did not help for: char d[256]; void h(const char *s) {

__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }

fhahn commented 4 years ago

I think at least for the strncpy case, we fail because AA thinks strcpy might also read a. We hit this exit: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L1977

I think the reason is that strcpy is missing nocapture writeonly on the first argument. Looks like libc on BSD systems claims at least that the arguments of strcpy cannot overlap. Both arguments are already marked as noalias, so presumably we already use this info.

I won't have time to tackle this in the near future, but it should be quite straight-forward to put up a patch.

davidbolvansky commented 4 years ago

strncpy testcase:

char d[256];

void h(const char *s) {

__builtin_strncpy(d, s, 8); // dead store __builtin_memset(d, 0, sizeof d); }

davidbolvansky commented 2 years ago

So all testcases are optimized except this one:

char d[256];

void h(const char *s) {

__builtin_strcpy(d, s); // dead store
__builtin_memset(d, 0, sizeof d);
}
nikic commented 2 years ago

IR test: https://llvm.godbolt.org/z/sTf6vrc5b

From a cursory look, this is probably due to https://github.com/llvm/llvm-project/blob/a43f7d6d76984ddae4a5e5e0bebf82ee2edebabb/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L1293