rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.54k stars 12.47k forks source link

Bad quality code for `Clone` on enum types #69174

Open reinerp opened 4 years ago

reinerp commented 4 years ago

For this Rust code, we get some pretty bad assembly:

#[derive(Clone)]
pub enum Foo {
    A(u8),
    B(bool),
}

#[derive(Clone)]
pub enum Bar {
    C(Foo),
    D(u8),
}

pub fn clone_foo(f: &Foo) -> Foo {
    f.clone()
}

pub fn clone_bar(b: &Bar) -> Bar {
    b.clone()
}

(Playground)

Assembly:

playground::clone_bar:
    movb    1(%rdi), %al
    cmpb    $1, (%rdi)
    jne .LBB1_2
    xorl    %ecx, %ecx
    movl    $1, %edx
    jmp .LBB1_3

.LBB1_2:
    movzbl  2(%rdi), %edx
    xorl    %ecx, %ecx
    testb   %dl, %dl
    setne   %cl
    cmpb    $1, %al
    sete    %al
    cmovnel %edx, %ecx
    shll    $16, %ecx
    xorl    %edx, %edx

.LBB1_3:
    orl %edx, %ecx
    movzbl  %al, %eax
    shll    $8, %eax
    orl %ecx, %eax
    retq

To see that it's possibly to do better, we can simply add a Copy instance to the types. Then the code even for Clone gets much better:

#[derive(Clone, Copy)]
pub enum Foo {
    A(u8),
    B(bool),
}

#[derive(Clone, Copy)]
pub enum Bar {
    C(Foo),
    D(u8),
}

pub fn clone_foo(f: &Foo) -> Foo {
    f.clone()
}

pub fn clone_bar(b: &Bar) -> Bar {
    b.clone()
}

(Playground)

Assembly:

playground::clone_bar:
    movzwl  (%rdi), %ecx
    movzbl  2(%rdi), %eax
    shll    $16, %eax
    orl %ecx, %eax
    retq

It's still not perfect (why are there two 2-byte loads instead of a single 4-byte load?) but it's much better.

reinerp commented 4 years ago

(Oh, on the last comment: I see why there's two loads: one of them was a 1-byte load, not a 2-byte load....)

nagisa commented 4 years ago

Clone is special-cased for cases where Copy is also implemented. The other case depends entirely on optimizer as derive(Clone) has to generate a fairly generic sequence of operations that would work for arbitrary enums.

Aaron1011 commented 4 years ago

I wonder if it would be worthwhile to try to detect and optimize this case. If a type 'could have been' Copy (it and all of its transitive fields are either #[derive(Copy)] or #[derive(Clone)] with no user-provided Clone impls), then the Clone implementation should be able to be the same as a Copy implementation.

For maximum effectiveness, this would probably have to be done during codegen (e.g. instead of codegenning the actual Clone implementation, we 'synthesize' a function in librustc_codegen_ssa), which would be quite hacky. However, it might result in some performance wins (less work for LLVM to do, and more efficient code generation).

nikic commented 4 years ago

It's still not perfect (why are there two 2-byte loads instead of a single 4-byte load?) but it's much better.

Because that could result in a fault. Note that Bar is 1-aligned, so there is no guarantee that byte 4 is dereferencable. Adding align(2) would be sufficient to guarantee that, in which case it will compile to a simple load.

Patryk27 commented 10 months ago

Btw, it looks like nowadays the compiler gets it right - at least in the release mode, the assembly ends up being same for Copy and non-Copy enums:

playground::clone_foo:
    movzbl  (%rdi), %eax
    movzbl  1(%rdi), %edx
    retq

playground::clone_bar:
    movzwl  (%rdi), %eax
    retq