xchem / xchem_it

Issues for XChem IT work
0 stars 0 forks source link

Recreate VMs in K8S clusters to standardise OS and use server groups #4

Open tdudgeon opened 3 years ago

tdudgeon commented 3 years ago

As the clusters were created at different times while the base image was evolving there are 3 or 4 different operating systems flavours involved which makes management overly complex.

Also, the VMs should be rotated at suitable periods so that security updates are picked up.

When redeploying the VMs we should use OpenStack's 'Server groups' functionality to define appropriate affinity and anti-affinity. Most important is anti-affinity among the worker nodes and the control plan nodes so that if a hypervisor goes down we do not loose multiple VMs, as happened once last year.

This needs doing in the DEV, PROD and Rancher clusters.

reskyner commented 3 years ago

Particular case where there's a semi-dead node, and anti-affinity issue are prio1, everything else 2/3

tdudgeon commented 3 years ago

Server groups do not seem to be implemented in Rancher. See: https://github.com/rancher/rancher/issues/16696 https://github.com/rancher/rancher/issues/24696

tdudgeon commented 3 years ago

Servers in the dev cluster have been recreated. All now run ubuntu 20.04. Currently there are:

Rancher does not seem to support server groups so the OpenStack API was used to determine the hypervisor on which each VM is running. 2 of the etcd nodes were running on the same hypervisor so a new one was created and one of the ones on the same hypervisor (xch-dev-etcd-a1) was removed. Now all nodes run on different hypervisors. This was done using:

openstack server show <vm_id> | grep hostId