rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.14k stars 12.69k forks source link

Initializing structs with Option fields set to `None` produces non-optimal assembly #104290

Open dotdash opened 1 year ago

dotdash commented 1 year ago

Given a struct S like this:

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

Initializing this struct with all fields set to None, gives non-optimal assembly code, as LLVM doesn't combine the stores into a memset, because the u8 fields within the Option fields are not set. Initializing all fields to Some(0) and then overwriting them with None gives optimal results.

pub struct S {
    pub f1: Option<u8>,
    pub f2: Option<u8>,
    pub f3: Option<u8>,
    pub f4: Option<u8>,
    pub f5: Option<u8>,
    pub f6: Option<u8>,
    pub f7: Option<u8>,
    pub f8: Option<u8>,
}

pub fn bad() -> S {
    S {
        f1: None,
        f2: None,
        f3: None,
        f4: None,
        f5: None,
        f6: None,
        f7: None,
        f8: None,
    }
}

pub fn good() -> S {
    let mut s = S {
        f1: Some(0),
        f2: Some(0),
        f3: Some(0),
        f4: Some(0),
        f5: Some(0),
        f6: Some(0),
        f7: Some(0),
        f8: Some(0),
    };

    s.f1 = None;
    s.f2 = None;
    s.f3 = None;
    s.f4 = None;
    s.f5 = None;
    s.f6 = None;
    s.f7 = None;
    s.f8 = None;

    s
}

Gives

    .text
    .file   "undefmemset.9474daa7-cgu.0"
    .section    .text._ZN11undefmemset3bad17h0bf7686eb09f8ffcE,"ax",@progbits
    .globl  _ZN11undefmemset3bad17h0bf7686eb09f8ffcE
    .p2align    4, 0x90
    .type   _ZN11undefmemset3bad17h0bf7686eb09f8ffcE,@function
_ZN11undefmemset3bad17h0bf7686eb09f8ffcE:
    .cfi_startproc
    movq    %rdi, %rax
    movb    $0, (%rdi)
    movb    $0, 2(%rdi)
    movb    $0, 4(%rdi)
    movb    $0, 6(%rdi)
    movb    $0, 8(%rdi)
    movb    $0, 10(%rdi)
    movb    $0, 12(%rdi)
    movb    $0, 14(%rdi)
    retq
.Lfunc_end0:
    .size   _ZN11undefmemset3bad17h0bf7686eb09f8ffcE, .Lfunc_end0-_ZN11undefmemset3bad17h0bf7686eb09f8ffcE
    .cfi_endproc

    .section    .text._ZN11undefmemset4good17hc418ddbabe9f4c4aE,"ax",@progbits
    .globl  _ZN11undefmemset4good17hc418ddbabe9f4c4aE
    .p2align    4, 0x90
    .type   _ZN11undefmemset4good17hc418ddbabe9f4c4aE,@function
_ZN11undefmemset4good17hc418ddbabe9f4c4aE:
    .cfi_startproc
    movq    %rdi, %rax
    xorps   %xmm0, %xmm0
    movups  %xmm0, (%rdi)
    retq
.Lfunc_end1:
    .size   _ZN11undefmemset4good17hc418ddbabe9f4c4aE, .Lfunc_end1-_ZN11undefmemset4good17hc418ddbabe9f4c4aE
    .cfi_endproc

    .section    ".note.GNU-stack","",@progbits
Rageking8 commented 1 year ago

@rustbot label +A-codegen +I-slow

TimNN commented 1 year ago

I investigated this issue a bit a few weeks ago but never got around to reporting it. It reproduces with clang as well (https://godbolt.org/z/8Yxnqrq6K):

#include <cstdint>

struct WithPadding {
    uint8_t a;
    uint16_t b;
};

void foo(WithPadding* p) {
    *p = {1, 2}; // Two `mov`s
}

void bar(WithPadding* p) {
    WithPadding t = {1, 2};
    *p = t; // One `mov`
}

AFAICT the issue is that early on in the optimization pipeline LLVM sees store undef to the "value" byte of the Option<u8> (or padding in the C++ example), but that gets optimized out at some point. Then, later, the MemCpyOptPass doesn't know that the "value" byte is undef / allowed to be written to, so generates multiple stores instead of just one.

Here is a smaller Rust repro (https://godbolt.org/z/xohrPs1rT):

pub fn with_hole(x: &mut [Option<u8>; 2]) {
    *x = [None, Some(2)]; // Two `mov`s
}

pub fn one_word(x: &mut [Option<u8>; 2]) {
    *x = [Some(1), Some(2)]; // One `mov`
}
clubby789 commented 1 year ago

The two issues are

The second issue can be easily fixed, but inserting a MemCpyOpt early enough in the pipeline to fix this issue causes regressions in some other LLVM tests