Closed cristicalin closed 2 years ago
Upon further diagnosis it looks like to container manager detection logic may be at fault when an old container_manager has not been properly cleaned up. In my case I had docker.service
files laying around from a previous install. I think also a service check should be performed to see if the service is actually running.
Another symptom which I was unable to reproduce is that the cluster.yaml
run actually triggered a re-install of the docker engine which I cannot quite explain.
Environment:
Cloud provider or hardware configuration: Baremetal single node clusters.
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Version of Ansible (
ansible --version
):Version of Python (
python --version
):Kubespray version (commit) (
git rev-parse --short HEAD
):Network plugin used:
Full inventory with variables (
ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):Command used to invoke ansible:
Output of ansible run:
Ansible run breaks cordoning off the node since the
container-engine/validate-container-engine
role calls theremove-node/pre-remove
node each time. There seems to be an issue in the container_manager detection logic which causes this role to trigger the action even though the container_manager has not been changed.Anything else do we need to know:
This container_manager replacement logic should be tested in CI to ensure it works properly and does not break existing deployments before we tag 2.19.
/cc @cyril-corbon