Open sanjeevgorai opened 3 years ago
Hi @sanjeevgorai,
Is your setup behind a proxy? Please let us know if so; if not we will see what else may be going on.
yes, this setup is behind the proxy server. In production we are not allowed for direct internet connection.
We think you may have set up apt proxy and then rebooted the VM. Is this the case? If so, your saved configurations will be lost. This requires setting up apt- proxy again for cluster upgrade to work.
====errors=== root@cse2p2h11 [ ~ ]# vcd cse cluster upgrade ESA10 ubuntu-16.04_k8-1.17_weave-2.6.0 2 cluster operation: Upgrading cluster 'ESA10' software to match template ubuntu-16.04_k8-1.17_weave-2.6.0 (revision 2): Kubernetes: 1.16.13 -> 1.17.9, Docker-CE: 18.09.7cluster operation: Upgrading cluster 'ESA10' software to match template ubuntu-16.04_k8-1.17_weave-2.6.0 (revision 2): Kubernetes: 1.16.13 -> 1.17.9, Docker-CE: 18.09.7 -> 19.03.5, CNI: weave 2.6.0 -> 2.6.0 cluster operation: Draining master node ['mstr-6rvc'] cluster operation: Upgrading Kubernetes (1.16.13 -> 1.17.9) in master node ['mstr-6rvc'] task: 22e47bff-d502-421d-a7c1-cbc8cb176cb9, result: error, message: Unexpected error while upgrading cluster 'ESA10': Script execution failed on node ['mstr-6rvc'] Errors: ["W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease Temporary failure resolving 'security.ubuntu.com'\nW: Failed to fetch http://apt.kubernetes.io/dists/kubernetes-xenial/InRelease Temporary failure resolving 'apt.kubernetes.io'\nW: Some index files failed to download. They have been ignored, or old ones used instead.\nE: Failed to fetch http://apt.kubernetes.io/pool/kubeadm_1.17.9-00_amd64_572d520d47a06fee419b34c35cebf1f98307daae3a76c79da241245cc686d036.deb Temporary failure resolving 'apt.kubernetes.io'\n\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n"]
The cse cluster are getting updated if we manually upgrade the kubeadm, kubelet and kubectl manually on master and worker nodes but when trying to do so with cse client its getting failed with above errors.
Hi Sanjeev,
The errors you are noticing are generally caused by network/internet connectivity issues. In past I have seen that sometime there is mismatch between state of the VM nic reported by vCD and VC (specially right after a reboot), and that can lead to these sort of errors. CSE is not doing anything special in these scripts. It might be just a race condition between the guest tool being ready vs the nic becoming functional.
May I suggest that after the proxy details are setup in the vm and the vm is rebooted, wait a few minutes and test out internet connectivity and then start the upgrade process. In case the proxy details are being injected via bashrc or something similar, try adding a poll loop at the end to make sure internet is reachable via the proxy.
Let me know the outcome of the experiments.
Regards Aritra Sen
Hello Aritra Thanks for your comments.
if there is any issue with the internet or network connectivity then the kubeadm ,kubelet and kubectl upgrade should failed when we try to upgrade these components by logging directly into the master and worker node its getting upgraded successfully, so we consider that if there is any issue with proxy then this manual up-gradation should also fail. The upgrade is only getting failed when we try to do this from CSE upgrade command on CSE client. We need to understand that when we execute cse upgrade command , how its is getting triggered on master and worker nodes vm's. How to confirm if cse client is able to execute/run the upgrade scripts on the master and worker nodes of the clusters. we are not able to find any logs for this execution on master and worker nodes.
Hello Aritra,
There was no NIC misconfiguration as confirmed by Vivek from Orange Engineering team.
Also If you check the attached vcd logs you will see that firstly the API call to task completed with 200 OK. But after many iteration (around 20+) it give error 500. Check snipped below.
Request uri (GET): https://vcloud.lab.local/api/task/4e521934-9612-4f8c-96f8-b53a026bbab7 Request headers: {'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/*+xml;version=32.0', 'Connection': 'keep-alive', 'x-vcloud-authorization': '[REDACTED]'} Response status code: 200 Response headers: {'Date': 'Thu, 04 Feb 2021 11:00:58 GMT', 'X-VMWARE-VCLOUD-REQUEST-ID': '8f61efd0-c920-4fdd-a111-f5efd8b1d479', 'X-VMWARE-VCLOUD-ACCESS-TOKEN': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJvcmcwNSIsImlzcyI6ImE4Mzk4YWRkLWE0NGItNDZkOS04NTllLWEyYmE1ZTFkYWFmNUA4YWQ5MWIwZC03NjFjLTQ1MzctOTQzMi1iMmZlOGVjYWU1ODEiLCJleHAiOjE2MTI1MjA2NjgsInZlcnNpb24iOiJ2Y2xvdWRfMS4wIiwianRpIjoiOWY0ZDk3YmI1NjI4NDc5YjhjYzM0YjcxNzcwY2QwOTgifQ.PjIqIM_aPuHPMpBavhj2r3cGzhHKnWkbs94pOxnwm8v36A-R_KCli4cs0eAHgS3I_JQqvHSYw_NJW_fQ1oVso-cy_4ZQsQaPCrze5Uc84KfIjRgI0sR4Clh_AyoUS0LQnpcffIj253Lj7xebgA-WfqbSQvYEg5H_ttqpPjkkRIlNbyxQw-OVaFY2tGyA7vPTnPvI9KJbV_F6lQiFw7ZHf8njCjyMHtp7YVYN0PsWY0abf820XnsSasfuYopTweyQ8Q09AwUspddNWd965sGAO5q8aynjUh9rCvujEEazOgjAw08jhhC4mwhcFdTQ5Qd3MfYJkkyjru1JC0uppcQuhQ', 'X-VMWARE-VCLOUD-TOKEN-TYPE': 'Bearer', 'x-vcloud-authorization': '[REDACTED]', 'Content-Type': 'application/vnd.vmware.vcloud.task+xml;version=32.0', 'X-VMWARE-VCLOUD-REQUEST-EXECUTION-TIME': '53', 'Cache-Control': 'no-store, must-revalidate', 'Vary': 'Accept-Encoding, User-Agent', 'Content-Length': '1866'} Response body: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Request uri (GET): https://vcloud.lab.local/api/task/4e521934-9612-4f8c-96f8-b53a026bbab7 Request headers: {'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': 'application/*+xml;version=32.0', 'Connection': 'keep-alive', 'x-vcloud-authorization': '[REDACTED]'} Response status code: 200 Response headers: {'Date': 'Thu, 04 Feb 2021 11:01:03 GMT', 'X-VMWARE-VCLOUD-REQUEST-ID': '3733b0e9-fc5d-412d-8391-218aa9768477', 'X-VMWARE-VCLOUD-ACCESS-TOKEN': 'eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJvcmcwNSIsImlzcyI6ImE4Mzk4YWRkLWE0NGItNDZkOS04NTllLWEyYmE1ZTFkYWFmNUA4YWQ5MWIwZC03NjFjLTQ1MzctOTQzMi1iMmZlOGVjYWU1ODEiLCJleHAiOjE2MTI1MjA2NjgsInZlcnNpb24iOiJ2Y2xvdWRfMS4wIiwianRpIjoiOWY0ZDk3YmI1NjI4NDc5YjhjYzM0YjcxNzcwY2QwOTgifQ.PjIqIM_aPuHPMpBavhj2r3cGzhHKnWkbs94pOxnwm8v36A-R_KCli4cs0eAHgS3I_JQqvHSYw_NJW_fQ1oVso-cy_4ZQsQaPCrze5Uc84KfIjRgI0sR4Clh_AyoUS0LQnpcffIj253Lj7xebgA-WfqbSQvYEg5H_ttqpPjkkRIlNbyxQw-OVaFY2tGyA7vPTnPvI9KJbV_F6lQiFw7ZHf8njCjyMHtp7YVYN0PsWY0abf820XnsSasfuYopTweyQ8Q09AwUspddNWd965sGAO5q8aynjUh9rCvujEEazOgjAw08jhhC4mwhcFdTQ5Qd3MfYJkkyjru1JC0uppcQuhQ', 'X-VMWARE-VCLOUD-TOKEN-TYPE': 'Bearer', 'x-vcloud-authorization': '[REDACTED]', 'Content-Type': 'application/vnd.vmware.vcloud.task+xml;version=32.0', 'X-VMWARE-VCLOUD-REQUEST-EXECUTION-TIME': '57', 'Cache-Control': 'no-store, must-revalidate', 'Vary': 'Accept-Encoding, User-Agent', 'Content-Length': '3178'} Response body: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Hello All,
Have upgraded CSE cluster from 2.5.1 to 2.6.1. Now if I try to upgrade one of the existing cluster(created from ubuntu templates) from one revision to another revision , getting below errors. From errors it seems that the issue is with DNS but its not, as when i try to wget the urls link (from errors logs) manually on master server, its getting downloaded. So assume this is not a dns errors.
Please suggest if some body have any idea on this.
================================================= root@cse2p2h11 [ ~ ]# vcd cse cluster upgrade ESA20 ubuntu-16.04_k8-1.18_weave-2.6.5 1 cluster operation: Upgrading cluster 'ESA20' software to match template ubuntu-16.04_k8-1.18_weave-2.6.5 (revision 1): Kubernetes: 1.17.9 -> 1.18.6, Docker-CE: 19.03.5 cluster operation: Upgrading cluster 'ESA20' software to match template ubuntu-16.04_k8-1.18_weave-2.6.5 (revision 1): Kubernetes: 1.17.9 -> 1.18.6, Docker-CE: 19.03.5 -> 19.03.12, CNI: weave 2.6.0 -> 2.6.5 cluster operation: Draining master node ['mstr-5vm3'] cluster operation: Upgrading Kubernetes (1.17.9 -> 1.18.6) in master node ['mstr-5vm3'] task: 64daf4c3-de32-4c36-9e3a-56cc6b5317c1, result: error, message: Unexpected error while upgrading cluster 'ESA20': Script execution failed on node ['mstr-5vm3'] Errors: ["W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'\nW: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease Temporary failure resolving 'security.ubuntu.com'\nW: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/InRelease Resolving timed out after 30535 milliseconds\nW: Failed to fetch http://apt.kubernetes.io/dists/kubernetes-xenial/InRelease Temporary failure resolving 'apt.kubernetes.io'\nW: Some index files failed to download. They have been ignored, or old ones used instead.\nE: Failed to fetch http://apt.kubernetes.io/pool/kubeadm_1.18.6-00_amd64_d4a4d123be4a196da5e34d7f8d95a224c431298ad18ab38edecbee6548d6236c.deb Temporary failure resolving 'apt.kubernetes.io'\n\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n"]