virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
1.91k stars 377 forks source link

viogpu_queue: rearrange virtio_gpu_vbuffer struct to reduce size by 8 bytes #1106

Open sharkautarch opened 3 weeks ago

sharkautarch commented 3 weeks ago

Signed-off-by: sharkautarch 128002472+sharkautarch@users.noreply.github.com

Note that this change only makes a difference for x86_64 and ARM64 targets; the size of the struct will remain the same for x86.

This struct isn't packed, so padding/alignment of each member can increase the size. Reduces size of the struct from 72 bytes to 64 bytes. tested in godbolt: https://godbolt.org/z/M7abMvYhh

Rearrangement is based off of the idea of ordering the members of a struct from biggest to smallest. (I learned about that trick from a comment in this video about padding/alignment: https://www.youtube.com/watch?v=E0QhZ6tNoRg)

EDIT: I just realized that u32 is actually unsigned long (defined in VirtIO/linux/types.h), but it is still only 32 bits (same size as unsigned int because apparently long is 32 bits on 64bit windows). which I guess is why it is called u32 updated godbolt: https://godbolt.org/z/ffeGh4nvT clang now prints out 72 bytes for both, but the assembly for msvc x64 shows that the size is still reduced to 64 bytes

kostyanf14 commented 2 weeks ago

rerun tests

sharkautarch commented 2 weeks ago

update: I made a slight adjustment to this PR, where I've now made it so that the sizes for each of the three buffer pointers are still close to each respective pointer, while still being arranged so that the size is reduced to 64 bytes: see this godbolt link measuring the offset distances: https://godbolt.org/z/KqG8bxEs9

I wanted the offset distances for each pointer&size pairs to be the same, but doesn't seem possible to do so without the size increasing back to 72 bytes... I've gotten the distances to be 4 bytes for one pair, and 8 bytes for the other pairs

YanVugenfirer commented 2 weeks ago

Sorry, what's the motivation for this PR?

sharkautarch commented 2 weeks ago

Sorry, what's the motivation for this PR?

reducing the size of the struct to 64 bytes should allow it to theoretically all fit within a single 64 byte cache line (technically the struct would need to be allocated on a 64 byte-aligned address to guarantee this)

iirc, this struct is frequently fetched in and out of memory, so this could have some impact on performance. Haven’t tested this tho

and yes, the struct holds pointers to other buffers that also have to be fetched from memory, but I’m pretty sure that the struct has to be fetched from memory before the buffers can then be fetched from memory

Let me know if I got anything wrong on this