Closed giacomobartoli closed 6 months ago
Note that there has been a report that calicoctl v3.16.1 isn't working on MacOS here: https://github.com/projectcalico/calicoctl/issues/2182
this might be related? I'd suggest trying an earlier version and see if you have the same results - will help us tell if this is the same issue or something different.
Hi @caseydavenport and thanks for your support. Issue projectcalico/calicoctl#2182 is different: when the user run 'calicloct' it returns no command as output. (I guess the user downloaded the wrong binary file). In my case, the command line is kind of stuck, waiting for something..
Moreover, previous versions have the same behaviour
After (6) there is no answer and the command is pending..
Ah, right. the command is pending. I'd double check that:
You might also want to try running with debug logging on to see if there are any clues:
calicoctl -l debug get nodes
Hi @caseydavenport Yes, I confirm that I have access to the cluster and my calicoctl is point at it. I run the command you suggested and this is the output:
INFO[0000] Log level set to debug
INFO[0000] Executing config command
DEBU[0000] Resource: projectcalico.org/v3, Kind=Node
DEBU[0000] Data: - apiVersion: projectcalico.org/v3
kind: Node
metadata:
creationTimestamp: null
spec: {}
DEBU[0000] Loading config from JSON or YAML data
DEBU[0000] Datastore type: etcdv3
INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://c6.mil01.containers.cloud.ibm.com:20131", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"/Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin-key.pem", EtcdCertFile:"/Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin.pem", EtcdCACertFile:"/Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/ca.pem", EtcdKey:"", EtcdCert:"", EtcdCACert:""}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false}}
DEBU[0000] Using datastore type 'etcdv3'
INFO[0000] Client: {{{CalicoAPIConfig projectcalico.org/v3} { 0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} {etcdv3 {https://c6.mil01.containers.cloud.ibm.com:20131 /Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin-key.pem /Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin.pem /Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/ca.pem } { false false false}}} 0xc000452090 0xc000450190}
DEBU[0000] Processing List request list-interface=Node rev=
DEBU[0000] Get Global Resource key from /calico/resources/v3/projectcalico.org/nodes
DEBU[0000] Didn't match regex
DEBU[0000] List options is a parent prefix, ensure path ends in / list-interface=Node rev=
DEBU[0000] Adding / to path list-interface=Node rev=
DEBU[0000] Calling Get on etcdv3 client etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/ list-interface=Node rev=
DEBU[0000] Calling Get on etcdv3 client etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/ list-interface=Node rev=
Yeah, it looks like it's performing a request against the etcd cluster to get the node information and isn't receiving a response back.
Yeah, it looks like it's performing a request against the etcd cluster to get the node information and isn't receiving a response back.
Even though it might be, I'd say it might be any kind of basic TLS problems like invalid CA, invalid or expired certs. I found out (the hard way) calicoctl is not giving any insights for problems of this kind, and it keeps on a 'trying-to-establish-connection' loop instead of exiting with a meaningful error. This was on both 3.15 and 3.16.
Wanna know the rationale? Well, long story downstairs.
I've met the exact same situation on a 100% linux production environment with calico + etcd with tls validation. calicoctl was stuck and debug didnt help with any meaningful messages. Came here to post a full report, but since I found this issue, I decided to comment instead.
I already solved my problem:
My problem was my certificates were expired.
Check this diagnostic 'thread':
Conectivity ok:
$ for i in 192.168.0.11 192.168.0.12 192.168.0.13 ; do { timeout 1 curl -svz1 telnet://$i:2379 2>&1 ; } | grep Connected; done
Connected to 192.168.0.11 (192.168.0.11) port 2379 (#0)
Connected to 192.168.0.12 (192.168.0.12) port 2379 (#0)
Connected to 192.168.0.13 (192.168.0.13) port 2379 (#0)
calicoctl stuck, no relevant information given:
$ calicoctl -l debug get globalnetworkpolicies
INFO[0000] Log level set to debug
INFO[0000] Executing config command
DEBU[0000] Resource: projectcalico.org/v3, Kind=Node
DEBU[0000] Data: - apiVersion: projectcalico.org/v3
kind: Node
metadata:
creationTimestamp: null
spec: {}
status: {}
DEBU[0000] Loading config from JSON or YAML data
DEBU[0000] Datastore type: etcdv3
INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://192.168.0.11:2379,https://192.168.0.12:2379,https://192.168.0.13:2379", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"/etc/calico/tls/acof/tls.key", EtcdCertFile:"/etc/calico/tls/acof/tls.crt", EtcdCACertFile:"/etc/calico/tls/acof/tls.ca", EtcdKey:"", EtcdCert:"", EtcdCACert:""}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0}}
DEBU[0000] Using datastore type 'etcdv3'
INFO[0000] Client: {{{CalicoAPIConfig projectcalico.org/v3} { 0 {{0 0
DEBU[0000] List options is a parent prefix, ensure path ends in / list-interface=Node rev=
DEBU[0000] Adding / to path list-interface=Node rev=
DEBU[0000] Calling Get on etcdv3 client etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/ list-interface=Node rev=
On etcd logs:
Oct 15 16:54:08 etcd1.intranet rkt[1085]: 2020-10-15 19:54:08.313941 I | embed: rejected connection from "10.10.10.42:40514" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "") Oct 15 16:54:10 etcd1.intranet rkt[1085]: 2020-10-15 19:54:10.975503 I | embed: rejected connection from "10.10.10.42:40520" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "") Oct 15 16:54:14 etcd1.intranet rkt[1085]: 2020-10-15 19:54:14.509551 I | embed: rejected connection from "10.10.10.42:40528" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "") Oct 15 16:54:20 etcd1.intranet rkt[1085]: 2020-10-15 19:54:20.949211 I | embed: rejected connection from "10.10.10.42:40538" (error "tls: failed to verify client's certificate: x509: certificate has expired or is not yet valid", ServerName "")
So, it seems calico tries to establish a conection, it fails on the TLS part, and every 2 seconds or so it retries, and it keeps retrying for ever.
In my case, one single attempt scored 61 connection attemps:
61
---
So, for curiosity sake, I also tried other TLS failures to see what happens. **ALL of them** ended with calicoctl stuck on a connection loop without any meaningful messages.
I tried these with curl:
* Invalid CA:
curl: (60) Peer's Certificate issuer is not recognized.
* Name mismatch
curl: (51) Unable to communicate securely with peer: requested domain name does not match the server's certificate.
* CA ok, name ok, but no client certs:
curl: (58) NSS: client certificate not found (nickname not specified)
These were calicoctl tests:
* Invalid CA:
... INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://etcd1.local:2379", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"", EtcdCertFile:"", EtcdCACertFile:"", EtcdKey:"", EtcdCert:"", EtcdCACert:""}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0}} ... ^C
* Name mismatch (it was on /etc/hosts)
... INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://etcd.devsres.com:2379", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"", EtcdCertFile:"", EtcdCACertFile:"tls/tls.ca", EtcdKey:"", EtcdCert:"", EtcdCACert:"tls/tls.ca"}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0}} ... ^C
* CA ok, name ok, but no client certs:
... INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://sp2srvvpkv00001:2379", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"", EtcdCertFile:"", EtcdCACertFile:"tls/tls.ca", EtcdKey:"", EtcdCert:"", EtcdCACert:""}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0}} ... ^C
I believe this should be easily reproduceable, and give @giacomobartoli a hint that if he has full conectivity with the server, he might want to debug all TLS stages and look for misconfigurations like the one I've met.
Thanks for the details @marcelo-devsres that's really helpful. I would expect calicoctl to at the very least log the etcd failures and ideally report the hard failure back to the user instead of retrying forever.
@marcelo-devsres does calico version 3.16.4 fix this bug?
This is the log I get from calico debug get globalpoliciesnetwork
MacBook-Pro-di-Giacomo:Downloads Giacomo$ calicoctl -l debug get globalnetworkpolicies
INFO[0000] Log level set to debug
INFO[0000] Executing config command
DEBU[0000] Resource: projectcalico.org/v3, Kind=GlobalNetworkPolicy
DEBU[0000] Data: - apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
creationTimestamp: null
spec: {}
DEBU[0000] Loading config from JSON or YAML data
DEBU[0000] Datastore type: etcdv3
INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"etcdv3", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"https://c6.mil01.containers.cloud.ibm.com:20131", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"/Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin-key.pem", EtcdCertFile:"/Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin.pem", EtcdCACertFile:"/Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/ca.pem", EtcdKey:"", EtcdCert:"", EtcdCACert:""}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0}}
DEBU[0000] Using datastore type 'etcdv3'
INFO[0000] Client: {{{CalicoAPIConfig projectcalico.org/v3} { 0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} {etcdv3 {https://c6.mil01.containers.cloud.ibm.com:20131 /Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin-key.pem /Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/admin.pem /Users/Giacomo/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/ca.pem } { false false false 0}}} 0xc00000e058 0xc0001fa090}
DEBU[0000] Processing List request list-interface=GlobalNetworkPolicy rev=
DEBU[0000] Get Global Resource key from /calico/resources/v3/projectcalico.org/globalnetworkpolicies
DEBU[0000] Didn't match regex
DEBU[0000] List options is a parent prefix, ensure path ends in / list-interface=GlobalNetworkPolicy rev=
DEBU[0000] Adding / to path list-interface=GlobalNetworkPolicy rev=
DEBU[0000] Calling Get on etcdv3 client etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/globalnetworkpolicies/ list-interface=GlobalNetworkPolicy rev=
@marcelo-devsres does calico version 3.16.4 fix this bug?
Still stuck.
ETCDs require TLS certs for connection. I didnt specify certs, and calicoctl is stuck, instead of exiting with some TLS error message.
# docker run --rm -v it -e ETCD_ENDPOINTS=http//etcd1.intra:2379 calico/ctl:v3.16.4 -l debug get nodes
...
time="2020-11-06T18:38:55Z" level=info msg="Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:\"etcdv3\", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:\"http//etcd1.intra:2379\", EtcdDiscoverySrv:\"\", EtcdUsername:\"\", EtcdPassword:\"\", EtcdKeyFile:\"\", EtcdCertFile:\"\", EtcdCACertFile:\"\", EtcdKey:\"\", EtcdCert:\"\", EtcdCACert:\"\"}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:\"\", K8sAPIEndpoint:\"\", K8sKeyFile:\"\", K8sCertFile:\"\", K8sCAFile:\"\", K8sAPIToken:\"\", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:\"\", K8sClientQPS:0}}"
time="2020-11-06T18:36:44Z" level=debug msg="Calling Get on etcdv3 client" etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/ list-interface=Node rev=
^C
@caseydavenport so, how can I run the command calicoctl without incorring into this issue?
@giacomobartoli @marcelo-devsres It depends on your cluster setup how you should set the ETCD_ENDPOINTS variable. On our system we basically issued the internal certificates for the connection to the ETCD to IP adresses which means we also have to use the IP adress in ETCD_ENDPOINTS variable instead of the hostname. So maybe replacing your ETCD_ENDPOINTS=http://etcd1.intra:2379 with ETCD_ENDPOINTS=http://YOUR-ETCD-IP:2379 or ETCD_ENDPOINTS=https://YOUR-ETCD-IP:2379 does the trick. just my 2 ¢ ...
Expected Behavior
Running the command
calicoctl get nodes
I am expecting to see two kubernetes nodesCurrent Behavior
The command get stucked after having correctly installed calico for OSX
Steps to Reproduce (for bugs)
ibmcloud ks cluster config --cluster btct7mhf06d8fo6mmi6g --admin --network
Ouput: The configuration for btct7mhf06d8fo6mmi6g was downloaded successfully. Network Config: /Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/calicoctl.cfgmv /Users/it001366/Downloads/calicoctl-darwin-amd64 /usr/local/bin/calicoctl
chmod +x /usr/local/bin/calicoctl
sudo mkdir /etc/calico
sudo mv /Users/it001366/.bluemix/plugins/container-service/clusters/mycluster-free-btct7mhf06d8fo6mmi6g-admin/calicoctl.cfg /etc/calico
calicoctl get nodes
After (6) there is no answer and the command is pending..Context
I am trying to follow this tutorial to install Calico CLI: https://cloud.ibm.com/docs/containers?topic=containers-network_policies#cli_install
Your Environment