nutanix / libvfio-user

framework for emulating devices in userspace
BSD 3-Clause "New" or "Revised" License
154 stars 49 forks source link

Avoid bit fields in protocol message definitions #768

Open mnissler-rivos opened 11 months ago

mnissler-rivos commented 11 months ago

The vfio_user_header.flags field currently uses bit fields to specify various sub-fields: https://github.com/nutanix/libvfio-user/blob/149aa845b11bd13fca41dcf65b51283f83ac5520/include/vfio-user.h#L80-L93

Bit field layout is up to the compiler, and thus may differ depending on machine byte order, optimization settings, etc.

I observed this as problematic when testing on a big-endian host. All VFIO-user protocol fields are in big-endian byte order then (as expected per #615 IIUC), but the flags subcomponents weren't compiled in reverse order, thus ending up in the wrong bit positions.

jlevon commented 11 months ago

it's host-endian, not big-endian: that is, we don't do, or attempt to do, any handling of different endianness at all. We certainly wouldn't be against supporting this, but not something we're planning to work on ourselves.

mnissler-rivos commented 11 months ago

To clarify my environment: I ran a qemu vfio-user client against a qemu vfio-user server on a big-endian host. So, both client and server are in big-endian byte order and hence expected to interoperate.

However, I found that server and client disagree on the bit field layout in the header flags field. The qemu client would treat flags as a single uint32_t value in host byte order and so the flags field of a reply message would be 0x00 0x00 0x00 0x01 in memory (which I consider correct per the spec, considering that the flags field is supposed to be in host byte order, which is big endian). However, libvfio-user's bit field places the type bits at other end of the uint32_t, i.e. a flags field of a reply message is encoded as 0x10 0x00 0x00 0x00. Unsurprisingly, I didn't even get past version negotiation ;-)

Then I researched what the net has to say about bit fields and byte order, and found that the standards leave this implementation defined. In practice, compilers are forced to implement whatever widespread conventions there are (e.g. Linux does https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/ip.h?h=03702d4d29be4e2510ec80b248dbbde4e57030d9#n88 for example). At the end of the day, it's IMO prudent to accept that bit field encoding is a mine field and rather rely on shifting and masking to pack/unpack the flags field.

jlevon commented 11 months ago

I see, because the client doesn't use bitfields, we get bitten by the different on big endian systems. Makes sense.

tmakatos commented 11 months ago

@mnissler-rivos are you planning to submit a patch?

mnissler-rivos commented 11 months ago

It's not a priority for me given that none of my stuff is running on big-endian host systems (I found this after checking on big-ending when checking qemu behavior in context of a code review there). If you're all OK with replacing the bitfield with #defines, I can put together a quick MR and test it later this week.

tmakatos commented 11 months ago

I'm not sure there's any other way around this other than using #defines, so I'd take such a PR, @jlevon ?

jlevon commented 11 months ago

yup for sure thanks