nutanix / libvfio-user

framework for emulating devices in userspace
BSD 3-Clause "New" or "Revised" License
166 stars 51 forks source link

need to handle 2-byte write at offset PCI_MSIX_FLAGS + 1 #640

Closed tmakatos closed 2 years ago

tmakatos commented 2 years ago

When live migrating back to a host QEMU does this:

/* load enable bit and maskall bit */
vfio_pci_write_config(pdev, pdev->msix_cap + PCI_MSIX_FLAGS + 1, offset, 2);

But we don't handle it:

[2022-01-26 13:38:50.125926] vfio_user.c:2071:vfio_user_log: *DEBUG*: /var/tmp/vfio-user//5d342f82-c067-4bd4-94d7-0eb9b2238d9a/nvme/5d342f82-c067-4bd4-94d7-0eb9b2238d9a: added PCI cap "MSI-X" size=0xc offset=0x84
...
/var/tmp/vfio-user//5d342f82-c067-4bd4-94d7-0eb9b2238d9a/nvme/5d342f82-c067-4bd4-94d7-0eb9b2238d9a: W 7 0x87-0x89
[2022-01-26 13:38:52.620816] vfio_user.c:2079:vfio_user_log: *ERROR*: /var/tmp/vfio-user//5d342f82-c067-4bd4-94d7-0eb9b2238d9a/nvme/5d342f82-c067-4bd4-94d7-0eb9b2238d9a: invalid MSI-X write offset 135
[2022-01-26 13:38:52.620936] vfio_user.c:2079:vfio_user_log: *ERROR*: /var/tmp/vfio-user//5d342f82-c067-4bd4-94d7-0eb9b2238d9a/nvme/5d342f82-c067-4bd4-94d7-0eb9b2238d9a: failed to write 0x87-0x88: Invalid argument
[2022-01-26 13:38:52.620974] vfio_user.c:2079:vfio_user_log: *ERROR*: /var/tmp/vfio-user//5d342f82-c067-4bd4-94d7-0eb9b2238d9a/nvme/5d342f82-c067-4bd4-94d7-0eb9b2238d9a: msg0x4c: cmd 10 failed: Invalid argument

IIUC this writes to 2nd half (byte) of struct mxc and to the 1st byte of struct mtab (the MSI-X table), which we didn't previously handle at all as QEMU was handling MSI-X.

tmakatos commented 2 years ago

This is done in function vfio_pci_load_config. Upstream QEMU doesn't seem to do this, the two implementations are quite different in that area.

According to section 6.8.2. ("MSI-X Capability and Table Structures") of "PCI Local Bus Specification Revision 3.0", writtig 2 bytes at offset pdev->msix_cap + PCI_MSIX_FLAGS + 1 means that we write the upper byte of "Message Control", "Table BIR" and the lower bits (0-4) of "Table Offset".

The 1st byte writes to the following parts: the upper 8 bits of "Message Control" is the 3 uppermost bits of "Table Size" which is RO so we can ignore, next are the 3 Reserved bits which again we ignore, and then it's the "Function Mask" and "MSI-X Enable" bits which is must set.

The 2nd byte write to the following parts: the table BIR and the lower 5 bits of the table offset, both are RO so we can ignore.

What's curious is why does QEMU do a 2-byte write (where the 2nd byte touches RO registers so it doesn't have any effect), when a 1-byte write would suffice? I quickly looked at the code and my impression is that there might be a reason (e.g. minimum alignemnt) for writing at leat a word (2 bytes) over the PCI bus.

jlevon commented 2 years ago

fixed?

tmakatos commented 2 years ago

Yes, by 2d1d87016133b6c2f38e4f6a5fca6be5b820653c.