Open dotdash opened 1 year ago
@rustbot label +A-codegen +I-slow
I investigated this issue a bit a few weeks ago but never got around to reporting it. It reproduces with clang
as well (https://godbolt.org/z/8Yxnqrq6K):
#include <cstdint>
struct WithPadding {
uint8_t a;
uint16_t b;
};
void foo(WithPadding* p) {
*p = {1, 2}; // Two `mov`s
}
void bar(WithPadding* p) {
WithPadding t = {1, 2};
*p = t; // One `mov`
}
AFAICT the issue is that early on in the optimization pipeline LLVM sees store undef
to the "value" byte of the Option<u8>
(or padding in the C++ example), but that gets optimized out at some point. Then, later, the MemCpyOptPass
doesn't know that the "value" byte is undef / allowed to be written to, so generates multiple stores instead of just one.
Here is a smaller Rust repro (https://godbolt.org/z/xohrPs1rT):
pub fn with_hole(x: &mut [Option<u8>; 2]) {
*x = [None, Some(2)]; // Two `mov`s
}
pub fn one_word(x: &mut [Option<u8>; 2]) {
*x = [Some(1), Some(2)]; // One `mov`
}
The two issues are
InstCombine
pass eliminates all the store undef
instructions before we reach MemCpyOpt
MemCpyOpt
folds preceding undef
stores into store 0
, but not succeeding ones (https://reviews.llvm.org/D140697)The second issue can be easily fixed, but inserting a MemCpyOpt
early enough in the pipeline to fix this issue causes regressions in some other LLVM tests
Given a struct
S
like this:Initializing this struct with all fields set to
None
, gives non-optimal assembly code, as LLVM doesn't combine the stores into a memset, because theu8
fields within theOption
fields are not set. Initializing all fields toSome(0)
and then overwriting them withNone
gives optimal results.Gives