kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.05k stars 6.45k forks source link

On Redhat Systems, Kubespray Ansible Configures Yum HTTPS Proxy Incorrectly On Second Run #8687

Closed charles-horel closed 2 years ago

charles-horel commented 2 years ago

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): HEAD

Network plugin used: kaniko

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): see attached kubespray-debug_inventory.txt

Command used to invoke ansible: ansible-playbook -i inventory/mycluster/dev-inventory.ini --become --user="" cluster.yml

Output of ansible run: The ansible run stops working on the second run, on any attempt to contact the yum repo. This is because the repolist becomes broken on the second run due to how ansible inserts the proxy.

Anything else do we need to know:

It looks like the root of the issue is in the playbook to bootstrap RHEL hosts: https://github.com/kubernetes-sigs/kubespray/blob/be9de6b9d9f1036656d59bcaeb8e6776d0e03240/roles/bootstrap-os/tasks/bootstrap-redhat.yml

On line 19 here it invokes a task using regex replace to insert the proxy into the repo configuration. It works fine if run twice, but the second run it adds a second prefix and suffix. The ansible bootstrapping ideally is able to be run multiple times

After the second run on a RHEL environment you'll see the repo proxy will look like: 'http://http://wx1.no.cg.nms.mlb.inet:3128/:3128/'

cristicalin commented 2 years ago

The http_proxy is value is something take from the ansible inventory, in your sample it doesn't have a prefix so I think the issue is not with kubespray but with the way the redhat subscription manager works. As long as kubespray calls it the same between two runs I don't see a bug we can fix. Can you share the output of an ansible run with -vvv where this breaks ?

charles-horel commented 2 years ago

It is very possible there's something with our setup or our OS causing the issue. We've opted to instead remove the proxy definition out of the file before it runs that command (add a couple of tasks to the ansible playbook on our local 'fork'), sort of bypassing whatever is happening in our specific instance. I took a closer look at the regex filters and agree it's likely not the ansible step I originally suspected as the cause of the change.