oVirt / vdsm

The Virtual Desktop Server Manager
GNU General Public License v2.0
160 stars 201 forks source link

Start SPM Task failed after Attaching a Storage Domain #320

Open ahadas opened 1 year ago

ahadas commented 1 year ago

Attaching a Storage Domain to a Data Center fails to start SPM.

Version-Release number of selected component (if applicable): 4.4.10

How reproducible: This is generally a Disaster Recovery scenario but it's more easily to reproduce it on a s single environment with 1 VDSM host.

Steps to Reproduce:

  1. Configure a Data Center & Storage Domain. I also had 2 non-running VMs and a few disks (some attached and some not), but IMHO it's irrelevant.
  2. Put the Storage Domain in maintenance mode.
  3. Detach the Storage Domain.
  4. Attach the Storage Domain (that was detached at previous step) back to the same Data Center. Note: as I wrote it's a DR scenario and if moving to another environment the chance for the issue to happen is lower. Also even having 2 VDSM hosts running on the same environment reduces the chance for the bug to happen, because somehow after SD Attach, the SPM moves from one host to another and there is no race. So for the purpose of this bug's reproduction it is better work on the same environment, with 1 VDSM host.

Actual results: Data Center & Storage Domain are down. They are shown as up-and-running for a few seconds, but then become red, Storage Domain is locked. Engine & VDSM logs show errors. There is a task on the VDSM (under /rhev/data-center///master/tasks/) that is not removed: [root@vdsm1 ~]# cd /rhev/data-center/4dc0a377-4dd3-494c-ad18-7aa2008c43b1/ecac38cc-bd4b-47b1-be42-702adc810dd3/master/tasks/ [root@vdsm1 tasks]# ll total 4 drwxr-xr-x. 2 vdsm kvm 4096 Mar 21 11:38 b813ea20-1886-439b-8f85-bfb41256ba3b [root@vdsm1 tasks]# sudo tar -czvf b813ea20-1886-439b-8f85-bfb41256ba3b.tar.gz b813ea20-1886-439b-8f85-bfb41256ba3b b813ea20-1886-439b-8f85-bfb41256ba3b/ b813ea20-1886-439b-8f85-bfb41256ba3b/b813ea20-1886-439b-8f85-bfb41256ba3b.recover.0 b813ea20-1886-439b-8f85-bfb41256ba3b/b813ea20-1886-439b-8f85-bfb41256ba3b.job.0 b813ea20-1886-439b-8f85-bfb41256ba3b/b813ea20-1886-439b-8f85-bfb41256ba3b.task b813ea20-1886-439b-8f85-bfb41256ba3b/b813ea20-1886-439b-8f85-bfb41256ba3b.result b813ea20-1886-439b-8f85-bfb41256ba3b/b813ea20-1886-439b-8f85-bfb41256ba3b.recover.1

Expected results: Data Center & Storage Domain should be green, up-and-running.

Additional info: Logs are attached in bz

Original bz: https://bugzilla.redhat.com/2067173