rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

Stuck task: VDI.set_name_description #45

Closed mdmeier closed 7 years ago

mdmeier commented 7 years ago

Hi there,

XenServer 7.1, Ceph Kraken, and Cloudstack 4.9.2.0

Our pool master is getting stuck with task that's preventing other disk events from happening:

uuid ( RO) : 5f192420-7643-efe4-912b-a0c4edd34e48 name-label ( RO): VDI.set_name_description name-description ( RO): status ( RO): pending progress ( RO): 0.000

It seems to be something that's firing this inside Cloudstack's management. After a period of activity I get a one of these for every VDI in the xensource log:

May 27 14:26:20 cloud102-1 xapi: [debug|cloud102-1|1783 UNIX /var/lib/xcp/xapi|VDI.set_name_description R:0c1eadc11704|audit] VDI.set_name_description: VDI = 'a33d38cc-8a17-4998-b49d-a5ce4f750a31' name-description = '' May 27 14:26:21 cloud102-1 xapi: [debug|cloud102-1|1985 UNIX /var/lib/xcp/xapi|VDI.set_name_description R:ec836d040bc1|audit] VDI.set_name_description: VDI = '38d3849d-dd17-4b95-b0bf-f4cf9fffacbd' name-description = ''

(there's about 70 of these and they fire around 1 second apart) ...etc...

There's no indication of error, and the task doesn't go away even after 24 hours.

The problem is that VDI tasks like migration wait for this task to complete, so a backlog of activity eventually drives load up on the pool master.

I haven't found any specific VDI that's problematic.

Help would be appreciated. The issue started this week, we've been successfully running the RBD storage plugin for the week before.

I'm going to try recreating the issue by running manual vdi-param-set commands on each uuid, but that's a fairly time-intensive process.

Thanks!

rposudnevskiy commented 7 years ago

Hi, If you have these messages for all VDIs with interval about 1 second then it looks like sr-scan operation. During sr-scan RBDSR try to update label and description with information stored in RBD image metadata. if you have many VDIs it can take some time but it definitely shouldn't continuous for 24 hours. Maybe Cloudstack execute sr-scan repeatedly, I don't know. I didn't test it with Cloudstack.

mdmeier commented 7 years ago

It appears you're correct. There are a number of SR.scan operations happening in the task list creating quite the load on the server.

Is there a way to optimize the scan operation in RBDSR? CloudStack didn't have this problem with our previous FreeNAS/iSCSI setup, and there's a lot more VDI's set to move to Ceph.

From xe task-list:

uuid ( RO) : a7cae627-7677-b457-7b9b-8b732fd2033b name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:35:42Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : d2b192e5-3f04-957e-98f3-844fb43fa6cf name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:02:29Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : 93ce3f2f-8123-b330-06e9-e365d1f1e190 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:57:51Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : ba43dd55-62ca-71ed-b9a2-0651cf226f9d name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T03:19:59Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : ebd9768f-e465-699c-f7f3-e788aac71205 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T01:40:22Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : 748ea7c6-3a25-ead8-8837-643df7b716c5 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:13:33Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : 745ad9fd-3494-03f3-0f6d-cdea682842d7 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:24:37Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : 376e724c-8310-5555-0b83-6cab30d16744 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T01:51:25Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : c9b40947-271d-00c6-8ad0-f8ef2e7620e9 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T03:31:03Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : 4e456ab6-7463-cda5-9168-5576c7488879 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T02:46:47Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

uuid ( RO) : cd2f4f4a-472c-00f3-38b8-bc392f7f69d0 name-label ( RO): SR.scan name-description ( RO): subtask_of ( RO): subtasks ( RO): resident-on ( RO): 6c5cc464-89b7-4cf6-acfd-673fb259a711 status ( RO): pending progress ( RO): 0.000 type ( RO): result ( RO): created ( RO): 20170529T03:08:55Z finished ( RO): 19700101T00:00:00Z error_info ( RO): allowed_operations ( RO): Cancel other-config (MRW):

mdmeier commented 7 years ago

Ok, able to recreate this without cloudstack. The following command is taking up a large amount of system resources, and when run from the command line will hang:

rbd image-meta set VHD-3954d3aa-624b-4489-92ab-37da8ad7c049 VDI_LABEL ROOT-524 --pool RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678 --name client.admin

(label and vhd will change, but the command result will always hang after a couple attempts)

rposudnevskiy commented 7 years ago

Hi, I will try to optimize the scan operation. At present it loads metadata for all rbd images and set label and description for all vdis in xapi database each time you run sr-scan. It's not practical. I think it's possible to set label and description only for those vdis that don't exists before scan and don't touch those vdis that already exist. It should work faster. Anyway, the storing of the metadata with rbd image is not critical and can be disabled at all. The storing of the metadata makes sense if you want to attach existing rbd pool to newly installed xenservers and in this case labels and descriptions can be restored from rbd metadata.

As for your last comment, it appears that root of the problem not in RBDSR but in Ceph itself. I think you should check the health of your Ceph ceph -s ceph osd tree Maybe it is related to some network issues, I can't tell at the moment, it requires further investigation.

mdmeier commented 7 years ago

Correct. The problem is with Ceph functionality, but it appears to be related to locking on the VDIs. There's no problems with OSD's, everything is active+healthy and network links are good.

This command was stuck on the pool master: rbd image-meta set VHD-fd2faba5-882e-4511-b489-3954fe78ca88 VDI_LABEL ROOT-506 --pool RBD_XenStorage-2dd455e9-0de4-4ed8-af62-64e1a4ace678 --name client.admin

Trying to run it elsewhere from cli like the Ceph admin node causes an easily replicated error.

What I found however was that "rbd lock list" showed an exclusive lock on this VDI:

line: VHD-fd2faba5-882e-4511-b489-3954fe78ca88 There is 1 exclusive lock on this image. Locker ID Address
client.3973024 auto 140704471018672 192.168.103.115:0/936064466

Pool master is: 192.168.101.112, 103.115 is a slave.

If I "rbd lock remove" the lock I can then run "rbd image-meta set", but this causes file system errors on the VM, presumably since changing the state from exclusive causes a disconnect. VM needs to then be rebooted with fsck.

Further research shows that Ceph Jewel did not offer --exclusive with rbd-nbd, but it is in Kraken. Maybe this is related?

Thanks again

rposudnevskiy commented 7 years ago

It's strange but Luminous, which I currently use, allows to change image metadata even with exclusive lock set. Maybe it's a bug of Kraken. If our problem caused by exclusive lock then it can be fixed by two means:

  1. Never update metadata on mapped images or
  2. Issue tap-ctl pause to pause IO, unmap image, update metadata, remap and unpause image I think the second is more correct.

Please give me a time and I will fix the problem with lock. Also I will try to optimize scan operation to reduce the number of calls of "rbd image-meta set"

Thank you.

mdmeier commented 7 years ago

Thank you! CloudStack has its own naming conventions for volumes so option (1) would be ideal. Considering how often metadata is being updated I don't see that (2) is viable without constantly pausing VMs.

I did ultimately stop the two VMs causing problems and they came back without locks. Not all running VMs have asked for exclusive locks on their images, but there's no indication when or why this happens. In the mean time I'm simply killing long-running "rbd image-meta set" processes so that other queued items (like VDI migration) can work. It's a hack, but it'll work until your code update is ready.

rposudnevskiy commented 7 years ago

Hi, Please check updated version ba3b020

Two optional device configs parameters has been added: use-rbd-meta - Store VDI params in rbd metadata if set True (default=True) vdi-update-existing - Update params of existing VDIs on scan if set True (default=True)

You can change device-config args of PBD or change defaults USE_RBD_META_DEFAULT = True VDI_UPDATE_EXISTING_DEFAULT = True right in the RBDSR.py

mdmeier commented 7 years ago

I set:

VDI_UPDATE_EXISTING_DEFAULT = False

Just so the data would be there on the VDI.create/copy/clone. This appears to have had a drastic speedup on our SR.scan process as well.

rposudnevskiy commented 7 years ago

Did the update help you to resolve the problem?

mdmeier commented 7 years ago

Looks like it. There's no longer a backlog of SR.scan/VDI.set_name_description tasks on the pool master and load is almost nil:

[root@cloud101-12 ~]# xe task-list uuid ( RO) : a2403851-88db-2e64-f556-8673b5584e9b name-label ( RO): host.call_plugin name-description ( RO): status ( RO): pending progress ( RO): 0.000

uuid ( RO) : 0dcf09fb-5084-2819-8955-161ae32147ea name-label ( RO): SR.scan name-description ( RO): status ( RO): pending progress ( RO): 0.000

There used to be 5-10 SR.scan tasks queued at any given time.

I am having an unrelated issue where one of my Xen hosts last night wouldn't boot an RBD VDI for some reason. I could load the disk on another VM, and even boot off an ISO and view it, but the CentOS guest would always drop to grub rescue on boot. A reboot of the host resolved this, and I'll post a follow-up as a different issue if it persists.

Thank you for your work on this. I see there's an unrelated ticket asking about commercial support, I'd be willing to contribute to that if you're looking to provide a more formal structure for the addon.