Open c410-f3r opened 1 week ago
This one optimizes well: https://godbolt.org/z/W7dMqzEse
I suspect the issue is with the array::IntoIter not optimizing as well as a slice iter because it has to move around things more and has a more complicated ABI.
Ah, and because TrustedRandomAccess
can't be soundly implemented for that iterator, which is important for Zip
.
Hahahahahahahahahahahahaha! My apologies for the mood but it is really funny to see such a simple fix after several different unsuccessful approaches.
Anyway, thank you very much for the tip!!!
This is probably a
LLVM
behavior that is affectingrustc
.The following snippet explicitly deals with arrays of 64 bytes and was extracted from a WebSocket procedure that unmasks frames received from an external party.
https://godbolt.org/z/bM8YPn4sz
Unfortunately the generated code is not making use of
zmm
registers. On the other hand, a similar C implementation compiled with gcc generates a much better output.https://godbolt.org/z/M8va4x6n9
Here goes another Rust version that also uses manual loops -> https://godbolt.org/z/6fex3c4Eb