rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.69k stars 12.75k forks source link

Missed AVX512 opportunity when dealing with arrays of 64 bytes #132909

Open c410-f3r opened 1 week ago

c410-f3r commented 1 week ago

This is probably a LLVM behavior that is affecting rustc.

The following snippet explicitly deals with arrays of 64 bytes and was extracted from a WebSocket procedure that unmasks frames received from an external party.

https://godbolt.org/z/bM8YPn4sz

pub fn stuff([a, b, c, d]: [u8; 4], slice: &mut [[u8; 64]]) {
    let mask: [u8; 64] = [
      a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a,
      b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d, a, b,
      c, d, a, b, c, d,
    ];
    for array in slice {
      for (array_elem, mask_elem) in array.iter_mut().zip(mask) {
        *array_elem ^= mask_elem;
      }
    }
}

Unfortunately the generated code is not making use of zmm registers. On the other hand, a similar C implementation compiled with gcc generates a much better output.

https://godbolt.org/z/M8va4x6n9

#include <stdint.h>
#include <stddef.h>

void stuff(uint8_t mask[4], uint8_t slice[][64], size_t slice_length) {
    uint8_t expanded_mask[64];
    for (size_t i = 0; i < 64; i++) {
        expanded_mask[i] = mask[i & 3];
    }
    for (size_t i = 0; i < slice_length; i++) {
        for (size_t j = 0; j < 64; j++) {
            slice[i][j] ^= expanded_mask[j];
        }
    }
}

Here goes another Rust version that also uses manual loops -> https://godbolt.org/z/6fex3c4Eb

the8472 commented 1 week ago

This one optimizes well: https://godbolt.org/z/W7dMqzEse

I suspect the issue is with the array::IntoIter not optimizing as well as a slice iter because it has to move around things more and has a more complicated ABI. Ah, and because TrustedRandomAccess can't be soundly implemented for that iterator, which is important for Zip.

c410-f3r commented 1 week ago

Hahahahahahahahahahahahaha! My apologies for the mood but it is really funny to see such a simple fix after several different unsuccessful approaches.

Anyway, thank you very much for the tip!!!