rposudnevskiy / RBDSR

RBDSR - XenServer/XCP-ng Storage Manager plugin for CEPH
GNU Lesser General Public License v2.1
58 stars 23 forks source link

Some fixes to make it work #1

Closed scpcom closed 8 years ago

scpcom commented 8 years ago

Thank you for creating RBDSR. I was missing this solution since we are running XenServer and Ceph. It seems to use cephs native functions as much as possible.

I tested RBDSR on our XenServer 7 with ceph jewel lab environment and without fixing some small things it did not work. First of all, PROVISIONING_DEFAULT was not defined so looked in some of the original Citrix scripts and copied it from there. I also had to remove all refrences to self.dconf['monitors'] because cephutils do not use them and do not expect them as paramter which causes errors. I imported some VMs and copied other VMs from Local Storage to the RBD SR. At this point most things worked, except for two tasks.

If a VM is already on the RBD SR and has no snapshots, I can not copy or clone it, because snap_sm_config["snapshot-of"] is not defined, but the clone function requires it. Same error occurs if you convert the VM to an template and try to create new VMs from this template. This part of my patch is not fix it is currently more a hack to make it work. There maybe a better way to handle this.

For another issue I did not find a solution: If you upgrade to ceph jewel, you get a permanent

 health HEALTH_WARN
        crush map has legacy tunables (require bobtail, min is firefly)

So you have to run at least: ceph osd crush tunables firefly or better: ceph osd crush tunables optimal

Because the kernel of XenServer 7 is too old and the kernel module does not support all features, RBD mapping does not work anymore. So you have to go back to ceph osd crush tunables bobtail

Which works stable, but you have the permanent HEALTH_WARN again.

--- a/RBDSR.py  2016-06-30 13:26:46.000000000 +0200
+++ b/RBDSR.py  2016-07-19 23:58:28.000000000 +0200
@@ -27,7 +27,8 @@
 import xml.dom.minidom
 import blktap2

-CAPABILITIES = ["VDI_CREATE","VDI_DELETE","VDI_ATTACH","VDI_DETACH","VDI_CLONE","VDI_SNAPSHOT", "VDI_RESIZE", "VDI_RESIZE_ONLINE", "ATOMIC_PAUSE", "VDI_UPDATE"
+CAPABILITIES = ["VDI_CREATE","VDI_DELETE","VDI_ATTACH","VDI_DETACH",
+                "VDI_CLONE","VDI_SNAPSHOT", "VDI_RESIZE", "VDI_RESIZE_ONLINE", "ATOMIC_PAUSE", "VDI_UPDATE",
                 "SR_SCAN","SR_UPDATE","SR_ATTACH","SR_DETACH","SR_PROBE"]
 CONFIGURATION = []
 DRIVER_INFO = {
@@ -45,7 +46,10 @@

 class RBDSR(SR.SR):
     """Shared memory storage repository"""
-    
+
+    PROVISIONING_TYPES = ["thin", "thick"]
+    PROVISIONING_DEFAULT = "thick"
+ 
     def _loadvdis(self):
         """Scan the location directory."""
         if self.vdis:
@@ -160,7 +164,7 @@

     def probe(self):
         util.SMlog("RBDSR.probe for %s" % self.uuid)
-        return cephutils.srlist_toxml(cephutils.scan_srlist(self.dconf['monitors']))
+        return cephutils.srlist_toxml(cephutils.scan_srlist())

     def load(self, sr_uuid):
         """Initialises the SR"""
@@ -168,7 +172,7 @@
             raise xs_errors.XenError('ConfigDeviceMissing',)

         self.sr_vditype = 'rbd'
-        self.provision = PROVISIONING_DEFAULT
+        self.provision = self.PROVISIONING_DEFAULT
         self.uuid = sr_uuid

@@ -190,8 +194,8 @@
     def scan(self, sr_uuid):
         """Scan"""
         self.sr_vditype = 'rbd'
-        self.provision = PROVISIONING_DEFAULT
-        RBDPOOLs = cephutils.scan_srlist(self.dconf['monitors'])
+        self.provision = self.PROVISIONING_DEFAULT
+        RBDPOOLs = cephutils.scan_srlist()
         self.physical_size = cephutils._get_pool_info(RBDPOOLs[sr_uuid],'size')
         self.physical_utilisation = cephutils._get_pool_info(RBDPOOLs[sr_uuid],'used')
         RBDVDIs = cephutils.scan_vdilist(RBDPOOLs[self.uuid])
@@ -213,11 +217,11 @@
         valloc = int(self.session.xenapi.SR.get_virtual_allocation(self.sr_ref))
         self.virtual_allocation = valloc + int(virtAllocDelta)
         self.session.xenapi.SR.set_virtual_allocation(self.sr_ref, str(self.virtual_allocation))
-        RBDPOOLs = cephutils.scan_srlist(self.dconf['monitors'])
+        RBDPOOLs = cephutils.scan_srlist()
         self.session.xenapi.SR.set_physical_utilisation(self.sr_ref, str(cephutils._get_pool_info(RBDPOOLs[sr_uuid],'used')))

     def _isSpaceAvailable(self, sr_uuid, size):
-        RBDPOOLs = cephutils.scan_srlist(self.dconf['monitors'])
+        RBDPOOLs = cephutils.scan_srlist()
         sr_free_space = cephutils._get_pool_info(RBDPOOLs[sr_uuid],'size') - cephutils._get_pool_info(RBDPOOLs[sr_uuid],'used')
         if size > sr_free_space:
             return False
@@ -398,7 +402,12 @@

         snap_vdi_ref = self.session.xenapi.VDI.get_by_uuid(snap_uuid)
         snap_sm_config = self.session.xenapi.VDI.get_sm_config(snap_vdi_ref)
-        old_base_uuid = snap_sm_config["snapshot-of"]
+        if snap_sm_config.has_key("snapshot-of"):
+            old_base_uuid = snap_sm_config["snapshot-of"]
+        else:
+            snapVDI = self._snapshot(sr_uuid, snap_uuid)
+            return self.clone(sr_uuid, snapVDI.uuid)
+
         base_uuid = None

         vdis = self.session.xenapi.SR.get_VDIs(self.sr.sr_ref)
@@ -454,6 +463,9 @@
             return cloneVDI.get_params()

     def snapshot(self, sr_uuid, vdi_uuid):
+        return self._snapshot(self, sr_uuid, vdi_uuid).get_params()
+
+    def _snapshot(self, sr_uuid, vdi_uuid):
         util.SMlog("RBDVDI.snapshot for %s" % (vdi_uuid))

         secondary = None
@@ -494,7 +506,7 @@

         blktap2.VDI.tap_unpause(self.session, sr_uuid, vdi_uuid, secondary)

-        return snapVDI.get_params()
+        return snapVDI

     def resize(self, sr_uuid, vdi_uuid, size):
         """Resize the given VDI to size <size> MB. Size can
rposudnevskiy commented 8 years ago

Thank you very much for your tests. I tested it on XenServer 6.6 with Ceph Infernalis only. Tests with XenServer 7 and Jewel was only in plans, so thank you again for your help.

As for XenServer 7 kernel, I had the similar problems with XenServer 6.6. The default RBD kernel module didn't support ceph with enabled cache tiering and RBD mapping didn't work. I managed to fix it by rebuilding libceph.ko and rbd.ko modules. I took sources of these modules from kernel-3.10.0-229 of CentOS 7.0-1503, put them into kernel-3.10.83 source tree of XSenServer 6.6 and rebuilt the modules. Using these rebuilt modules I could map RBDs with enabled cache tiering. I guess this method can help in your case too. Anyway, I don't think that this method is good. I'm not expert in the linux kernel and not sure that these rebuilt modules won't cause any other problems. It would be much better if XenServer developers could upgrade the kernel.

rposudnevskiy commented 8 years ago

Hi, You can make the warning

 health HEALTH_WARN
        crush map has legacy tunables (require bobtail, min is firefly)

go away without making any changes to CRUSH by adding the following option to your ceph.conf [mon] section:

mon warn on legacy crush tunables = false

You can read about it here http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables

Also you can find there compatibility matrix between Linux Kernel versions and CRUSH tunables profiles

scpcom commented 8 years ago

Yes, I found this option too. I also found you don't have to disable all features, you can go up to firefly features and only disable one:

sudo ceph osd crush tunables firefly
sudo ceph osd getcrushmap -o /tmp/crush
sudo crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
sudo ceph osd setcrushmap -i /tmp/crush.new

Other features can be disabled on the fly: sudo rbd feature disable RBD_XenStorage-985b47a3-5c0a-40b1-9149-da96c917f7bc/VHD-8b00e4e6-7865-47cd-aafd-20f7f883433b exclusive-lock object-map fast-diff deep-flatten

You can look which features are enabled on an RBD VHD by calling: sudo rbd info RBD_XenStorage-985b47a3-5c0a-40b1-9149-da96c917f7bc/VHD-8b00e4e6-7865-47cd-aafd-20f7f883433b The only enabled feature should be "layering"

I also found a wrong line in my patch. The line return self._snapshot(self, sr_uuid, vdi_uuid).get_params() must be corrected to return self._snapshot(sr_uuid, vdi_uuid).get_params()

djmoonshine commented 8 years ago

I have also tried this on Xenserver 7. Thanks so much Roman for writing this. I had the same issues as mentioned in this thread and fixed some of them but i did't manage to fix the clone issue so thanks for the patch for that scpcom.

I also have a suggestion to a change of the code to be able to work around the issue with the kernel to being old. Unfortunately i think it will always be behind in a product like this. There is an rbd client that uses a nbd to map the device to the kernel and the rbd code lies in userspace. I think the code would be about the same because the only difference when mapping is that we will need the nbd-rbd package and use the nbd-rbd map command instead of rbd map, and the device will be something like /dev/nbd0. I think all other commands will be the same. I think i'm capable to write the code myself and i'm willing to do it and provide a patch. The only thing that is required is that the nbd module is in the kernel and I haven't got access to my Xenserver machine to check that right now, but I will check that later. If that is present do you want me to write the code for it? I guess the best way would be to make this a configurable option so mabye the default would be tne kernel rbd and this another optional way. That would maybe a bit harder for me since i'm not that familiar with this kind of code, but i will try if you want me to.

rposudnevskiy commented 8 years ago

HI, Issues #1 and #2 have been fixed. In 'fuse' mode it should support all Jewel's features.