ibmcb / cbtool

Cloud Rapid Experimentation and Analysis Toolkit
Apache License 2.0
77 stars 49 forks source link

Beef up the PLM adapter #413

Closed mraygalaxy closed 2 years ago

mraygalaxy commented 3 years ago

We've been using the PLM adapter for a few months now with multiple hypervisor simultaneously and have hardened the adapter to better accomodate this kind of multi-hypervisor usage:

  1. VMCs are now supported the PLM adapter. Each hypervisor is now represented as a VMC and is scheduled using pre-existing scheduling algorithms available to all the other cloud adapters. We no longer uses the purely random choice() function for HOST-level discovery for VM scheduling. Randomness is still the default (in the VMC_DEFAULTS section), it just happens outside of the adapter instead of inside the adapter. DISCOVER_HOSTS is still supported, but scheduling happens as "VMCs" now instead of hosts so that all of the round-robin scheduling improvements are now available to the PLM adapter.
  2. Some thread-safety parallelism issues were fixed.
  3. The maximum storage assigned to qcow2 files was reduced a little bit to 100GB. (It was previously in the terabytes, which was kind of overkill. This would easily result in out-of-space issues as the spare files began filling up when experiments run for a very long time).
  4. The way the libvirt endpoint parsing was tweaked a bit to be more flexible for libvirt installations that are not in traditional locations at the host level.
  5. Network overlays are now supported (I use the term loosely). What this means is that you can use a single libvirt DNSMASQ instance for all of the hypervisors that are communicating with each other. You then go to the other machines and disable DNSMASQ so that all the VMs share the same L2 network. Once that is done, all the VMs pull their DHCP addresess from a single DHCP server like they would in a typical cloud environment. CloudBench then queries ALL of the hypervisors to ask for what the IP address was assigned on each provisioning request, because there's no way in advance to know which machine is hosting the DNSMASQ service (depending on how you setup the XML in libvirt).
  6. Cleanup operations were broken in a couple of places, especially when using multiple hypervisors.
  7. Additional whitespace fixes/logging statements.
  8. The cloud init ISO function was only used by the PLM adapter, so we moved it out of the shared location into the adapter itself.
  9. I have left some commented-out code to demonstrate how to enable hugepages at the QEMU-level that we were toying with. I'll come up with a "proper" implementation later.
  10. virtio-net multiqueue was enabled to match the # of VCPUs chosen by the Virtual Machine that is being provisioined. Let me know if this is not ideal, and I can make it optional instead of default.