Open davidbolvansky opened 4 years ago
char d[256];
void h(const char *s, int n) {
__builtin_memmove(d, s, n); // dead store __builtin_memset(d, 0, n); }
The case above can be handled with the same approach as https://reviews.llvm.org/rG8d9b9c0edceb
It just needs adding a case for memmove.
Ok, simple tests are optimized now.
This one remains:
char d[256];
void h(const char *s) {
__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }
Hot sure how this should be done.
It would be interesting also to handle non-CST size.
Probably MemoryLocation's size (https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Analysis/MemoryLocation.h#L66) would need to support arbitrary size values and at least BasicAA and some utilities in DSE need to be taught about that.
Handling the case where the sizes are the same exact value should probably be relatively straight forward. Others, like mixing dynamic and constant sizes will likely be trickier. But that should be something that could be done incrementally.
Ok, simple tests are optimized now.
This one remains:
char d[256];
void h(const char *s) {
__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }
Hot sure how this should be done.
It would be interesting also to handle non-CST size.
char d[256];
void h(const char *s, int n) {
__builtin_memmove(d, s, n); // dead store __builtin_memset(d, 0, n); }
Fixing BuildLibCalls to properly annotate strcpy fixed this case (patch coming soon): void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }
But it did not help for: char d[256]; void h(const char *s) {
__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }
Possibly a case of missing handling in MemoryLocation?
Fixing BuildLibCalls to properly annotate strcpy fixed this case (patch coming soon): void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }
But it did not help for: char d[256]; void h(const char *s) {
__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }
I think at least for the strncpy case, we fail because AA thinks strcpy might also read a. We hit this exit: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L1977
I think the reason is that strcpy is missing nocapture writeonly
on the first argument. Looks like libc on BSD systems claims at least that the arguments of strcpy cannot overlap. Both arguments are already marked as noalias, so presumably we already use this info.
I won't have time to tackle this in the near future, but it should be quite straight-forward to put up a patch.
strncpy testcase:
char d[256];
void h(const char *s) {
__builtin_strncpy(d, s, 8); // dead store __builtin_memset(d, 0, sizeof d); }
So all testcases are optimized except this one:
char d[256];
void h(const char *s) {
__builtin_strcpy(d, s); // dead store
__builtin_memset(d, 0, sizeof d);
}
IR test: https://llvm.godbolt.org/z/sTf6vrc5b
From a cursory look, this is probably due to https://github.com/llvm/llvm-project/blob/a43f7d6d76984ddae4a5e5e0bebf82ee2edebabb/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp#L1293
Extended Description
void f(const char *s) { char a[256]; __builtin_strcpy(a, s); // dead store }
void g(const char *s) { char a[256]; __builtin_memcpy(a, s, sizeof a); // dead store }
char d[256];
void h(const char *s) {
__builtin_strcpy(d, s); // dead store __builtin_memset(d, 0, sizeof d); }
Clang: f(char const): # @f(char const) sub rsp, 264 mov rsi, rdi mov rdi, rsp call strcpy add rsp, 264 ret g(char const): # @g(char const) ret h(char const): # @h(char const) push rax mov rsi, rdi mov edi, offset d call strcpy xorps xmm0, xmm0 movaps xmmword ptr [rip + d], xmm0 movaps xmmword ptr [rip + d+16], xmm0 movaps xmmword ptr [rip + d+32], xmm0 movaps xmmword ptr [rip + d+48], xmm0 movaps xmmword ptr [rip + d+64], xmm0 movaps xmmword ptr [rip + d+80], xmm0 movaps xmmword ptr [rip + d+96], xmm0 movaps xmmword ptr [rip + d+112], xmm0 movaps xmmword ptr [rip + d+128], xmm0 movaps xmmword ptr [rip + d+144], xmm0 movaps xmmword ptr [rip + d+160], xmm0 movaps xmmword ptr [rip + d+176], xmm0 movaps xmmword ptr [rip + d+192], xmm0 movaps xmmword ptr [rip + d+208], xmm0 movaps xmmword ptr [rip + d+224], xmm0 movaps xmmword ptr [rip + d+240], xmm0 pop rax ret d: .zero 256
We should be able eliminate strcpy calls (based on deref. bytes info, memset overwrites the whole buffer)