xapi-project / xen-api

The Xapi Project's XenAPI Server
http://xenproject.org/developers/teams/xapi.html
Other
346 stars 283 forks source link

Guest with 2 distinct VBD sharing the same `userdevice` - following race condition around `VM.revert`? #5849

Open ydirson opened 1 month ago

ydirson commented 1 month ago

For some reason one of guests now has 2 CD VBDs, with the same userdevice. I understand that this is not supposed to happen, with xe normally getting a DEVICE_ALREADY_EXISTSerror from XAPI:

# xe vbd-create vm-uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa device=3 type=CD mode=RO
A device with the name given already exists on the selected VM
device: 3

Despite this I now have VBDs on a guest violating this constraint:

# xe vbd-list vm-uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa userdevice=3
uuid ( RO)             : 70f2e451-e4d9-6ba5-f121-3fab8595835b
          vm-uuid ( RO): 8c222e47-1de3-86b5-d0fa-64c0964026fa
    vm-name-label ( RO): YDI - XCPng 8.3
         vdi-uuid ( RO): 386fd9e9-3778-47d8-ba3c-2abdb5755830
            empty ( RO): false
           device ( RO): 

uuid ( RO)             : 6af8c1c2-44d4-7ece-508e-c6ef78811c74
          vm-uuid ( RO): 8c222e47-1de3-86b5-d0fa-64c0964026fa
    vm-name-label ( RO): YDI - XCPng 8.3
         vdi-uuid ( RO): <not in database>
            empty ( RO): true
           device ( RO): xvdd

xensource.log shows for this VBD creation:

Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VM.get_allowed_VBD_devices D:e2ec9afbc29d|audit] VM.get_allowed_VBD_devices: VM = '8c222e47-1de3-86b5-d0fa-64c0964026fa (YDI - XCPng 8.3)'
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|audit] VBD.create: VM = '8c222e47-1de3-86b5-d0fa-64c0964026fa (YDI - XCPng 8.3)'; VDI = '386fd9e9-3778-47d8-ba3c
-2abdb5755830'
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|vbdops] Checking whether there's a migrate in progress...
Jul 15 19:02:05 R620-1 xapi: [debug||4729200 HTTPS X.X.X.X->:::80|VBD.create R:2729074df84e|vbdops] VBD.create (device = 3; uuid = 70f2e451-e4d9-6ba5-f121-3fab8595835b; ref = OpaqueRef:43accab4-4527-4277-a667-736f4e5a0511)

This matches the XO code that (when requested to insert a CD, and after determining a guest does not have a CD VBD yet) queries XAPI for allowed VBD devices and creates one. Which seems to imply that on this months-old VM on which I used that CD VBD tens of times, this particular time XAPI hallucinated the lack of the CD VBD for long enough to let the XO XAPI client create a new, conflicting one and insert a VDI in it.

The symptom to the user then, since is that this 2nd CD drive, which had a VDI inserted at creation time, causes that VDI to be automatically inserted in the 1st CD drive every time the VM starts. I guess that one would fall into "undefined behavior" because we're in a state that's not supposed to be possible?

The log (and existing snapshot timestamps) shows this incident closely follows a VM.revert:

Jul 15 19:01:40 R620-1 xapi: [debug||4728907 HTTPS 172.16.210.100->|Async.VM.revert R:2384787d6e36|xapi_vm_snapshot] Cloning the snapshotted disks

A few additional tests through XO show that, when using "VM.reset with the 'snapshot before' option activated", there is enough time to request a CD insertion before the notification of VM.revert ending comes in, and this time the extra VBD gets a distinct userdevice:

[18:53 R620-1 ~]# xe vm-cd-list uuid=8c222e47-1de3-86b5-d0fa-64c0964026fa
CD 0 VBD:
uuid ( RO)             : d214eb3b-51cd-0fbe-7fcc-3f2dedcad6b5
    vm-name-label ( RO): YDI XCPng 8.3
            empty ( RO): false
       userdevice ( RW): 4

CD 0 VDI:
uuid ( RO)             : 386fd9e9-3778-47d8-ba3c-2abdb5755830
       name-label ( RW): xcp-ng-8.3.0-rc1+ydi7.iso
    sr-name-label ( RO): ISOs
     virtual-size ( RO): 649068544

CD 1 VBD:
uuid ( RO)             : cedaa7ab-4cc3-5153-fd1e-0913c16238e2
    vm-name-label ( RO): YDI XCPng 8.3
            empty ( RO): true
       userdevice ( RW): 3

I assume once the race condition is triggered, YMMV.

This is on XAPI 1.249.32 on XCP-ng 8.2.1.

edwintorok commented 1 month ago

Unfortunately XAPI doesn't have a transactional database, or support for transactions, so these race conditions are always possible. In this particular case 'allowed operations' -style locking could be used, to forbid further changes to the VM while the revert is running.