hep-gc / cloud-scheduler

Automatically boot VMs for your HTC jobs
http://cloudscheduler.org
Apache License 2.0
3 stars 0 forks source link

Add support for OpenStack instances that require boot volumes #485

Open berghaus opened 4 years ago

berghaus commented 4 years ago

The OpenStackNative cloud type now allows the specification of two parameters boot_volume and boot_volume_gb_per_core that instruct cloudscheduler to create a boot volume on instance creation. The size of the boot volume is controlled by the second option.

berghaus commented 4 years ago

Where did the conflict come from .... hmmm.

berghaus commented 4 years ago

Must have had a fork of the repo before and therefor missing a few of the most recent changes.

rseuster commented 4 years ago

The first VM from cloudscheduler v1 ran jobs, however, two more files required changes:

# diff -u cloudconfig.py.orig cloudconfig.py
--- cloudconfig.py.orig 2019-08-06 14:05:02.868418103 -0700
+++ cloudconfig.py      2019-08-06 14:04:37.156218534 -0700
@@ -89,7 +89,7 @@
     :param name: The name of cloud to operate on
     :return: True if conf good, False if problem detected
     """
-    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'cacert',
+    valid_option_names = {'access_key_id', 'auth_dat_file', 'auth_url', 'blob_url', 'boot_timeout', 'boot_volume', 'boot_volumee_gb_per_core', 'cacert',
                           'cloud_type', 'contextualization', 'cpu_archs', 'cpu_cores', 'host',
                           'image_attach_device', 'key_name', 'keycert', 'max_vm_mem', 'max_vm_storage', 'memory',
                           'networks', 'password', 'placement_zone', 'port', 'priority', 'project_id', 'project_domain_name',

and

# diff -u openstackcluster.py.orig openstackcluster.py
--- openstackcluster.py.orig    2019-08-06 13:39:28.135490782 -0700
+++ openstackcluster.py 2019-08-06 13:49:10.896025215 -0700
@@ -70,6 +70,8 @@
         self.cacert = cacert
         self.user_domain_name = user_domain_name if user_domain_name is not None else "Default"
         self.project_domain_name = project_domain_name if project_domain_name is not None else "Default"
+        self.boot_volume = boot_volume
+        self.boot_volume_gb_per_core = boot_volume_gb_per_core
         self.session = None
         try:
             authsplit = self.auth_url.split('/')
@@ -116,9 +118,9 @@
         import novaclient.exceptions
         use_cloud_init = use_cloud_init or config.use_cloud_init
         nova = self._get_creds_nova_updated()
-        if boot_volume:
+        if self.boot_volume:
             cinder = self._get_creds_cinder()
-            from cinderclient import exceptions as ccexceptions
+        from cinderclient import exceptions as ccexceptions
         if len(securitygroup) != 0:
             sec_group = []
             for group in securitygroup:
@@ -249,7 +251,7 @@
         if name:
             log.info("Trying to create VM on %s: " % self.name)
             try:
-                if not boot_volume:
+                if not self.boot_volume:
                     instance = nova.servers.create(name=name,
                                                    image=imageobj,
                                                    flavor=flavor,
@@ -262,8 +264,8 @@
                     bdm = None
                     log.debug("creating boot volume")
                     bv_name = "vol-{}".format(name)
-                    if boot_volume_gb_per_core:
-                        bv_size = boot_volume_gb_per_core * cpu_cores
+                    if self.boot_volume_gb_per_core:
+                        bv_size = self.boot_volume_gb_per_core * cpu_cores
                     else:
                         bv_size = 20
                     cv = cinder.volumes.create(name=bv_name,

all worked - the created volumes also dissappeared after the jobs finished and the VM retired !

I also had to update the python bindings to openstack - cinderclient wasn't installed previously on the machine where I tried (which was verifycs.heprc)

rseuster commented 4 years ago

BTW - the running code is on verifycs in /usr/local/lib/python2.7/site-packages/cloudscheduler

berghaus commented 4 years ago

So the last commit should have addressed the missing points. I'll try it at cern.

berghaus commented 4 years ago

So that last push request is what is running on the cern CS.