Closed sd185406 closed 1 year ago
Can you successfully access the GCP bucket from the node itself?
yes I am able to access the GCP buckets from the node and list the files from inside of the buckets
I see that you have a bunch of other pods that are also stuck in Init or are crashing. Are the pods for the packaged components running correctly? Can you attach the output of:
kubectl get pods -A -o wide
cat /var/lib/rancher/k3s/agent/containerd/containerd.log
journalctl --no-pager -u k3s
Uploaded the files in zip.....PFA
Although they appear to be currently running, the logs show that many of the pods for packaged components (such as metrics-server) were stuck crashlooping from the beginning of the log at Sep 02 15:35:06
until Sep 02 19:48:45
, just after a restart of the k3s service. It looks like you made some changes to the system configuration that allowed the pods to work. Can you provide more information on how you configured K3s (any CLI flags or configuration files you added) as well as information on what you changed around that time?
It also looks like the containerd log does not go back further than the restart at Sep 02 19:48
so I can't tell what was going on before that, but I suspect it is related to the problems with your workload pods.
I have configured the k3's service using the below steps
$ export CLUSTER_CIDR=192.168.10.0/24 $ export SERVICE_CIDR=192.168.20.0/24 $ export EXTERNAL_IP= Your_VM_External_IP $ export K3S_KUBECONFIG_MODE="644" $ export INSTALL_K3S_EXEC="--cluster-cidr $CLUSTER_CIDR --service-cidr $SERVICE_CIDR --node-external-ip $EXTERNAL_IP" $ curl -sfL https://get.k3s.io | sh - $ k3s -v $ systemctl status k3s.service
Just for a quick workaround, i have performed the below troubleshooting steps in order to find the root cause
ERROR: gcloud crashed (TransportError): HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa61e401ee0>: Failed to establish a new connection: [Errno -3] Try again'))
It seems something is blocking the pod network due to that it is unable to authenticate to the gcloud services
Rancher kube-system dns logs are returning like below
kubectl logs -f coredns-b96499967-rbqrg -n kube-system
[ERROR] plugin/errors: 2 oauth2.googleapis.com. AAAA: read udp 192.168.10.19:42945->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. A: read udp 192.168.10.19:55572->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. AAAA: read udp 192.168.10.19:58091->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. A: read udp 192.168.10.19:54176->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. A: read udp 192.168.10.19:35297->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. AAAA: read udp 192.168.10.19:40570->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. A: read udp 192.168.10.19:46817->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. AAAA: read udp 192.168.10.19:40210->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. A: read udp 192.168.10.19:35235->8.8.8.8:53: i/o timeout [ERROR] plugin/errors: 2 oauth2.googleapis.com. AAAA: read udp 192.168.10.19:45841->8.8.8.8:53: i/o timeout
Yeah, it looks like container networking is not working for some reason. Can I ask why you've changed the container and service CIDR ranges? Do your custom ranges overlap with any of the subnets that the node is on? Can you confirm that there are no GCP-level security group rules that are interfering with outbound connections?
No David,
i have checked all the egress and ingress rules in gcp which is allowed the connectivity to all my subnets
And coming to the CIDR ranges configuration yes we use above configurations for all our dev environments. we had setup the same environment in centos vm there we didn't face any challenges all pods are seamlessly communicating to the GCP services
I think there might be some additional network configurations required in ubuntu machines for rancher k3s setup or else some bridge has to be established between the network interfaces
There isn't generally any special setup necessary on Ubuntu. Is there anything else installed on this node that might be interfering with this traffic? Docker, additional software managing firewall configuration or iptables, etc?
sorry by mistakenly i closed it
There isn't generally any special setup necessary on Ubuntu. Is there anything else installed on this node that might be interfering with this traffic? Docker, additional software managing firewall configuration or iptables, etc?
no we haven't installed anything other than k3s......
I have this issue too. Fresh install k3s v1.24.4+k3s1 on Ubuntu 18.04 with the ufw disabled by default, and iptables version 1.6.1 . Look like the network has an issue. From pod, I can ping to the host IP, but cannot ping to the host network default gateway. Changed the host OS to Ubuntu 20.04, and everything work fine.
Environmental Info: K3s Version:
k3s version v1.24.4+k3s1 (c3f830e9) go version go1.18.1
Node(s) CPU architecture, OS, and Version:
Linux ctm-ubantu-vm 5.4.0-1087-gcp #95~18.04.1-Ubuntu SMP Mon Aug 22 03:26:39 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
It's a single node cluster where i have installed the k3's service and running the application on top of it Describe the bug:
i have created one ubuntu VM in the gcp cloud and i install k3 service and deployed my application helm chart on top of it however some of the pods need to connect to the GCP bucket and pull the images however it is not happening instead it is throwing the below error ERROR: gcloud crashed (ConnectionError): HTTPSConnectionPool(host='oauth2.googleapis.com', port=443)
Steps To Reproduce:
Expected behavior:
All pods need to be up&running
Actual behavior:
pods are not running
Additional context / logs:
root@ctm-ubantu-vm:~# kubectl get pods -n store NAME READY STATUS RESTARTS AGE jarvis-scoxcashdelegate-6c4bf57d76-p85g2 0/1 Init:0/2 0 56m jarvis-scoxcashservice-ddbbdcd46-d9245 0/1 Init:0/1 0 56m jarvis-scoxprinter-b4577cf87-l9htt 0/1 Init:0/1 0 56m jarvis-scoxdoc-bddcfd-5kms9 0/1 Init:0/1 0 56m jarvis-rediscache-69d48468dc-sgxcd 1/1 Running 0 56m jarvis-hivemqce-5888d55bf6-xwlhc 1/1 Running 0 56m jarvis-mongodb-5f6bf5d859-48jc4 1/1 Running 0 56m jarvis-jarvisconfigservice-784889995b-28klj 0/1 Init:CrashLoopBackOff 15 (3m4s ago) 56m jarvis-scoxresources-7b5977bdd8-kqcvk 0/1 Init:CrashLoopBackOff 15 (3m1s ago) 56m jarvis-scoxerrorlookup-5b5499d866-4h9m5 0/1 Init:CrashLoopBackOff 15 (2m45s ago) 56m jarvis-scoxauthentication-846dcc684c-m968j 0/1 Init:CrashLoopBackOff 15 (2m33s ago) 56m
root@ctm-ubantu-vm:~# kubectl logs -p jarvis-jarvisconfigservice-784889995b-28klj -n store --all-containers Initializing from GCS... Version: gc://scox-configservice-assets/assets/assets-1.6.0.zip Checking if file, assets-1.6.0.zip, is already in the destination path, /var/lib/ncr_scot/jarvisconfigservice/. OVERWRITE_LOCAL_COPY is set to false. Downloading from GCS ERROR: gcloud crashed (TransportError): HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa61e401ee0>: Failed to establish a new connection: [Errno -3] Try again'))