Open davidzeng0 opened 8 months ago
The behavior here is specific to array lengths that are not a power of 2. I think this is a consequence of https://github.com/rust-lang/rust/pull/115236, the code looks quite well-optimized if the array length is 4 or 8.
I'd say that's still a problem, because any moves
to stack shouldn't happen in the first place
I have the following structs
pub struct Packet {
buffer: BufferRef, /* 3 usizes */
data: *mut u8,
size: usize,
pub time_base: Rational, /* u32, u32 */
pub duration: u64,
pub timestamp: i64,
pub track_index: u32,
pub flags: BitFlags<PacketFlag>,
pub zero_padding: usize
}
pub struct Element {
pub id: ElementId, /* u64 */
pub size: u64,
pub offset: u64,
pub end: u64
}
and multiple non-inline and inline calls to read(&mut self) -> Result<Option<Packet / Element>>
I mainly use maybe_with_data
's match statement rather than as_mut().map()
(which doesn't work either) to reduce the 1 additional indentation in my code
Even though the function call is inlined, I see multiple mov reg, [rsp + 0x300]
and mov [rsp + 0x110], reg
instructions copying the data between different places on the stack, at my match
statement succeeding
Especially when my code does around 100 nanos per packet, those extra vectorized and nonvectorized moves add significant overhead that I shouldn't get
I'm only covering Option and Result in this issue, but it should also apply to enums and any function that invites rustc
to move a value, even when it's unnecessary. It seems to be related to https://github.com/rust-lang/rust/issues/99504
Here is clang, with similar code to an Option in C++ https://godbolt.org/z/aPv9nPbn6. There is no mention of rsp
at all, until you uncomment the inline never attribute in which case all optimizations go out the window. Gcc produces similar code to clang
Yes it's still a problem, but I would say it is a problem with arrays, not with map
.
https://godbolt.org/z/GjrE4GjhK
for all 3 maybe_with_data functions, the move of certain values on the stack is unnecessary.
This seems like a fundamental issue and probably won't be fixed soon, but it should make all existing code run slightly faster