Open chrislovecnm opened 7 years ago
I've had success running a cluster on GCE with Calico with some workarounds. Pod-to-pod networking works, load balancers work!
Here's what I did:
kops create cluster \
--cloud gce \
--name c1.alpha.verse.lol \
--zones europe-west1-d \
--project verse-alpha \
--image "ubuntu-os-cloud/ubuntu-1604-xenial-v20170202" \
--networking calico
for i in `kubectl get nodes -o jsonpath='{.items[*].metadata.name}'`; do
curl http://localhost:8080/api/v1/nodes/$i/status > a.json
cat a.json | tr -d '\n' | sed 's/{[^}]\+NetworkUnavailable[^}]\+}/{"type": "NetworkUnavailable","status": "False","reason": "RouteCreated","message": "Manually set through k8s api"}/g' > b.json
curl -X PUT http://localhost:8080/api/v1/nodes/$i/status -H "Content-Type: application/json" -d @b.json
done
{master,node}-to-{master-node}
firewall rules, add protocol 4
to them (IP-in-IP)Thanks, @Dirbaio for the workaround! You saved me a bunch of time.
Hmmm ... I can't seem to get the workaround working for me.
The symptom's I'm seeing are:
a) kubectl get nodes
only returns the master ... so when you say 'wait for the kubelet to be up and detect all nodes' ... that never seems to happen for me
b) some of the pods (including calico) under the kube-system
namespace don't come up:
17-05-18[18:16:58]:pachyderm:0$kubectl --namespace=kube-system get all
NAME READY STATUS RESTARTS AGE
po/calico-node-crp1q 1/2 Error 3 3m
po/calico-policy-controller-811246363-b70w7 0/1 Pending 0 3m
po/dns-controller-3881114374-glrrs 0/1 Pending 0 3m
po/etcd-server-events-master-us-west1-a-dr4r 1/1 Running 0 3m
po/etcd-server-master-us-west1-a-dr4r 1/1 Running 0 3m
po/kube-apiserver-master-us-west1-a-dr4r 1/1 Running 0 2m
po/kube-controller-manager-master-us-west1-a-dr4r 1/1 Running 0 3m
po/kube-dns-1321724180-r90gh 0/3 Pending 0 3m
po/kube-dns-autoscaler-265231812-74rmf 0/1 Pending 0 3m
po/kube-proxy-master-us-west1-a-dr4r 1/1 Running 0 3m
po/kube-scheduler-master-us-west1-a-dr4r 1/1 Running 0 3m
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns 100.64.0.10 <none> 53/UDP,53/TCP 3m
NAME DESIRED SUCCESSFUL AGE
jobs/configure-calico 1 0 3m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/calico-policy-controller 1 1 1 0 3m
deploy/dns-controller 1 1 1 0 3m
deploy/kube-dns 1 1 1 0 3m
deploy/kube-dns-autoscaler 1 1 1 0 3m
NAME DESIRED CURRENT READY AGE
rs/calico-policy-controller-811246363 1 1 0 3m
rs/dns-controller-3881114374 1 1 0 3m
rs/kube-dns-1321724180 1 1 0 3m
rs/kube-dns-autoscaler-265231812 1 1 0 3m
In particular ... the calico pod reports an error connecting to etcd:
WARNING: $CALICO_NETWORKING will be deprecated: use $CALICO_NETWORKING_BACKEND instead
time="2017-05-18T23:05:17Z" level=info msg="NODENAME environment not specified - check HOSTNAME"
time="2017-05-18T23:05:17Z" level=info msg="Loading config from environment"
Skipping datastore connection test
time="2017-05-18T23:05:47Z" level=info msg="Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://etcd-a.internal.pach-af13f1b4.k8s.com:4001 exceeded header timeout
"
time="2017-05-18T23:05:47Z" level=info msg="Unable to query node configuration" Name=master-us-west1-a-pl8c error="client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://etcd-a.internal.pach-af13f1b4.k8s.com:4001 exceeded header timeout
"
ERROR: Unable to access datastore to query node configuration
Terminating
Calico node failed to start
In the syslog on the k8s master node, I see errors preventing the pods from coming up:
May 18 21:30:41 nodes-8lfc kubelet[6176]: I0518 21:30:41.914002 6176 kubelet_node_status.go:77] Attempting to register node nodes-8lfc
May 18 21:30:41 nodes-8lfc kubelet[6176]: E0518 21:30:41.953717 6176 eviction_manager.go:214] eviction manager: unexpected err: failed GetNode: node 'nodes-8lfc' not found
May 18 21:31:11 nodes-8lfc kubelet[6176]: E0518 21:31:11.914799 6176 kubelet_node_status.go:101] Unable to register node "nodes-8lfc" with API server: Post https://api.internal.pach-bf8b2e74.k8s.com/api/v1/nodes: dial tcp 208.73.210.202:443: i/o timeout
May 18 21:31:11 nodes-8lfc kubelet[6176]: E0518 21:31:11.936020 6176 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR
Which makes sense I think ... if calico isn't coming up.
Looking at the etcd pods ... they're healthy in every way that I can tell (describe / logs / manual attempt at healthcheck).
The other pods in the namespace that are stuck in pending ... report that there are 'no nodes' to schedule them on.
This doesn't quite make sense. Describing the master node ... its only at ~20% utilization for CPU resources. So unless those pods have anti affinity rules? And are also blocking on waiting for the other nodes to get registered to the cluster?
What remains unclear is why calico can't connect to etcd. That seems to be the underlying issue here, but I don't have much in the way of clues as to why that might be.
I'd love it if kops worked for GCE out of the box. We could really use multi cloud support under a single tool (and I love kops ... I'd love it to be this tool). But right now ... it seems like kops is only well supported for AWS. That's a great start ... I hope support for other cloud providers (incl Digital Ocean!) comes soon.
@sjezewski got the same error with you:
Nov 17 15:39:51 uy08-08 kubelet[24455]: 2017-11-17 15:39:51.799 [INFO][3937] client.go 202: Loading config from environment
Nov 17 15:39:59 uy08-08 kubelet[24455]: I1117 15:39:59.398876 24455 kuberuntime_manager.go:499] Container {Name:calico-node Image:quay.io/calico/node:v2.6.2 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:ETCD_ENDPOINTS Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:&ConfigMapKeySelector{LocalObjectReference:LocalObjectReference{Name:calico-config,},Key:etcd_endpoints,Optional:nil,},SecretKeyRef:nil,}} {Name:CALICO_NETWORKING_BACKEND Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:&ConfigMapKeySelector{LocalObjectReference:LocalObjectReference{Name:calico-config,},Key:calico_backend,Optional:nil,},SecretKeyRef:nil,}} {Name:CLUSTER_TYPE Value:kubeadm,bgp ValueFrom:nil} {Name:CALICO_DISABLE_FILE_LOGGING Value:true ValueFrom:nil} {Name:FELIX_DEFAULTENDPOINTTOHOSTACTION Value:ACCEPT ValueFrom:nil} {Name:CALICO_IPV4POOL_CIDR Value:192.168.122.0/24 ValueFrom:nil} {Name:CALICO_IPV4POOL_IPIP Value:always ValueFrom:nil} {Name:FELIX_IPV6SUPPORT Value:false ValueFrom:nil} {Name:FELIX_IPINIPMTU Value:1440 ValueFrom:nil} {Name:FELIX_LOGSEVERITYSCREEN Value:info ValueFrom:nil} {Name:IP Value: ValueFrom:nil} {Name:FELIX_HEALTHENABLED Value:true ValueFrom:nil}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:250 scale:-3} d:{Dec:<nil>} s:250m Format:DecimalSI}]} VolumeMounts:[{Name:lib-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:var-run-calico ReadOnly:false MountPath:/var/run/calico SubPath: MountPropagation:<nil>} {Name:calico-cni-plugin-token-5qtk2 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/liveness,Port:9099,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:10,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:6,} ReadinessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/readiness,Port:9099,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDe
Nov 17 15:39:59 uy08-08 kubelet[24455]: laySeconds:0,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Nov 17 15:39:59 uy08-08 kubelet[24455]: I1117 15:39:59.399155 24455 kuberuntime_manager.go:738] checking backoff for container "calico-node" in pod "calico-node-2sb8q_kube-system(5351b070-cbc4-11e7-9fbc-34e6d7899e5d)"
Nov 17 15:39:59 uy08-08 kubelet[24455]: I1117 15:39:59.399373 24455 kuberuntime_manager.go:748] Back-off 5m0s restarting failed container=calico-node pod=calico-node-2sb8q_kube-system(5351b070-cbc4-11e7-9fbc-34e6d7899e5d)
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.399406 24455 pod_workers.go:182] Error syncing pod 5351b070-cbc4-11e7-9fbc-34e6d7899e5d ("calico-node-2sb8q_kube-system(5351b070-cbc4-11e7-9fbc-34e6d7899e5d)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2sb8q_kube-system(5351b070-cbc4-11e7-9fbc-34e6d7899e5d)"
Nov 17 15:39:59 uy08-08 kubelet[24455]: 2017-11-17 15:39:59.862 [INFO][3899] etcd.go 373: Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.864160 24455 cni.go:319] Error deleting network: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.864927 24455 remote_runtime.go:115] StopPodSandbox "319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "kube-dns-545bc4bfd4-xb5z9_kube-system" network: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.864967 24455 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a"}
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.865041 24455 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "bfdfe510-cbcd-11e7-9258-f8db8846245c" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kube-dns-545bc4bfd4-xb5z9_kube-system\" network: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout\n"
Nov 17 15:39:59 uy08-08 kubelet[24455]: E1117 15:39:59.865079 24455 pod_workers.go:182] Error syncing pod bfdfe510-cbcd-11e7-9258-f8db8846245c ("kube-dns-545bc4bfd4-xb5z9_kube-system(bfdfe510-cbcd-11e7-9258-f8db8846245c)"), skipping: failed to "KillPodSandbox" for "bfdfe510-cbcd-11e7-9258-f8db8846245c" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kube-dns-545bc4bfd4-xb5z9_kube-system\" network: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout\n"
Nov 17 15:40:00 uy08-08 kubelet[24455]: W1117 15:40:00.922455 24455 docker_sandbox.go:343] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "kube-dns-545bc4bfd4-xb5z9_kube-system": CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a"
Nov 17 15:40:00 uy08-08 kubelet[24455]: W1117 15:40:00.923423 24455 cni.go:265] CNI failed to retrieve network namespace path: Cannot find network namespace for the terminated container "319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a"
Nov 17 15:40:01 uy08-08 kubelet[24455]: 2017-11-17 15:40:01.041 [INFO][3969] calico.go 315: Extracted identifiers ContainerID="319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a" Node="uy08-08" Orchestrator="k8s" Workload="kube-system.kube-dns-545bc4bfd4-xb5z9"
Nov 17 15:40:01 uy08-08 kubelet[24455]: 2017-11-17 15:40:01.041 [INFO][3969] utils.go 250: Configured environment: [LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin INVOCATION_ID=01c46102bc694a43a623f768273b5a1c JOURNAL_STREAM=8:91880 KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt KUBELET_CADVISOR_ARGS=--cadvisor-port=0 KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki CNI_COMMAND=DEL CNI_CONTAINERID=319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-545bc4bfd4-xb5z9;K8S_POD_INFRA_CONTAINER_ID=319ff7d36d67170d9c9f088c825d87096572ac83cdc8d3054da5c3163e358d3a CNI_IFNAME=eth0 CNI_PATH=/opt/calico/bin:/opt/cni/bin ETCD_ENDPOINTS=http://10.96.232.136:6666 KUBECONFIG=/etc/cni/net.d/calico-kubeconfig K8S_API_TOKEN=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJjYWxpY28tY25pLXBsdWdpbi10b2tlbi01cXRrMiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJjYWxpY28tY25pLXBsdWdpbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjE4NmZlYmI5LWNhM2ItMTFlNy05ZmJjLTM0ZTZkNzg5OWU1ZCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTpjYWxpY28tY25pLXBsdWdpbiJ9.TzNAVry1HBgJa3SH93mLlSxv5927MywAp_02dR1zf_Pht_F4AWXPMASUSF1rmMVuzEprp3drgIeYg1G-oN3oPIy3tnryAtmZbsmjBCylpLntjgZaJcXneCYgk8G8I0WfWO6H6jcG46cVoRB-3FQjKQKzedbgnURUA2EOE4sN2oLOSp5R0LMyh4GZQIEm1zW
Nov 17 15:40:01 uy08-08 kubelet[24455]: Xn8OSXQ3qh9iehXm9xpep3krkf5uoBcPfe-XrHjfPETyVSTS6oADdcO3RsIQDlQOtEGKy0WnJbIRcHQiIcIVVDf1MT5Yo6gWzD7dlEynLsvo4tvEuAi0IgCsO7k34PwuMjys-FQWcPUwF1uOOgD-XAA]
Also, my etcd cluster is healthy:
# export ETCDCTL_API=2
# etcdctl cluster-health
member 93ac7045b7c80fe2 is healthy: got healthy result from http://192.168.5.105:2379
member cceea3802386922f is healthy: got healthy result from http://192.168.5.104:2379
member e1f394bfa58b2a7f is healthy: got healthy result from http://192.168.5.42:2379
cluster is healthy
/cc @bboreham @caseydavenport
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen /remove-lifecycle stale
Flannel doesn't work either. Only kubenet is working ootb on GCE.
Adding my +1 as I just ran into this today testing on GCE. (CNI read only). Going to try the workaround later.
I got the same issue of "Master nodes are up but nodes never come up".
So I got the following in the kubelet logs
root@nodes-3lbm:~# systemctl status kubelet.service -l --no-pager
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/lib/systemd/system/kubelet.service; static; vendor preset: enabled)
Active: active (running) since Mon 2019-02-18 17:17:18 UTC; 12min ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 6144 (kubelet)
Tasks: 15
Memory: 42.8M
CPU: 11.959s
CGroup: /system.slice/kubelet.service
└─6144 /usr/local/bin/kubelet --allow-privileged=true --anonymous-auth=false --cgroup-root=/ --client-ca-file=/srv/kubernetes/ca.crt --cloud-provider=gce --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --enable-debugging-handlers=true --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true --hairpin-mode=promiscuous-bridge --kubeconfig=/var/lib/kubelet/kubeconfig --network-plugin=cni --node-labels=kops.k8s.io/instancegroup=nodes,kubernetes.io/role=node,node-role.kubernetes.io/node= --non-masquerade-cidr=100.64.0.0/10 --pod-infra-container-image=k8s.gcr.io/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --register-schedulable=true --v=2 --cloud-config=/etc/kubernetes/cloud.config --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/
Feb 18 17:29:27 nodes-3lbm kubelet[6144]: W0218 17:29:27.251404 6144 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 17:29:27 nodes-3lbm kubelet[6144]: E0218 17:29:27.251544 6144 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 18 17:29:28 nodes-3lbm kubelet[6144]: E0218 17:29:28.692316 6144 event.go:212] Unable to write event: 'Post https://api.internal.k8s.chx.cloud/api/v1/namespaces/default/events: dial tcp 203.0.113.123:443: i/o timeout' (may retry after sleeping)
Feb 18 17:29:32 nodes-3lbm kubelet[6144]: E0218 17:29:32.052379 6144 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "nodes-3lbm" not found
Feb 18 17:29:32 nodes-3lbm kubelet[6144]: I0218 17:29:32.139680 6144 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "nodes-3lbm"
Feb 18 17:29:32 nodes-3lbm kubelet[6144]: I0218 17:29:32.144031 6144 cloud_request_manager.go:108] Node addresses from cloud provider for node "nodes-3lbm" collected
Feb 18 17:29:32 nodes-3lbm kubelet[6144]: W0218 17:29:32.253006 6144 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 17:29:32 nodes-3lbm kubelet[6144]: E0218 17:29:32.253173 6144 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 18 17:29:37 nodes-3lbm kubelet[6144]: W0218 17:29:37.254504 6144 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 17:29:37 nodes-3lbm kubelet[6144]: E0218 17:29:37.255133 6144 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
This always hapens when using GCE
Only the masters ever come up
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-us-east1-b-m2v0 Ready master 14m v1.11.6
master-us-east1-c-35mz Ready master 14m v1.11.6
master-us-east1-d-27w5 Ready master 14m v1.11.6
Which CNI implementation did you select?
I used Calico.
Using flannel also doesn't work
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.075611 7822 server.go:986] Started kubelet
Feb 18 20:42:15 nodes-g86f kubelet[7822]: W0218 20:42:15.076407 7822 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 20:42:15 nodes-g86f kubelet[7822]: E0218 20:42:15.076568 7822 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.077666 7822 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "nodes-g86f"
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.077827 7822 desired_state_of_world_populator.go:130] Desired state populator starts to run
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.079473 7822 cloud_request_manager.go:108] Node addresses from cloud provider for node "nodes-g86f" collected
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.111010 7822 factory.go:356] Registering Docker factory
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.113141 7822 factory.go:54] Registering systemd factory
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.113507 7822 factory.go:86] Registering Raw factory
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.113885 7822 manager.go:1205] Started watching for new ooms in manager
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.115803 7822 manager.go:356] Starting recovery of all containers
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.179219 7822 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.179385 7822 kubelet.go:1771] skipping pod synchronization - [container runtime is down]
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.180767 7822 manager.go:361] Recovery completed
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.209020 7822 kubelet_node_status.go:317] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n1-standard-2
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.209558 7822 kubelet_node_status.go:328] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=us-east1-b
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.209874 7822 kubelet_node_status.go:332] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=us-east1
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.212009 7822 kubelet_node_status.go:441] Recording NodeHasSufficientDisk event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.212463 7822 kubelet_node_status.go:441] Recording NodeHasSufficientMemory event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.212807 7822 kubelet_node_status.go:441] Recording NodeHasNoDiskPressure event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.213151 7822 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.213491 7822 kubelet_node_status.go:79] Attempting to register node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.248387 7822 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.252353 7822 kubelet_node_status.go:317] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n1-standard-2
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.252381 7822 kubelet_node_status.go:328] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=us-east1-b
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.252388 7822 kubelet_node_status.go:332] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=us-east1
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254741 7822 kubelet_node_status.go:441] Recording NodeHasSufficientDisk event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254775 7822 kubelet_node_status.go:441] Recording NodeHasSufficientMemory event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254789 7822 kubelet_node_status.go:441] Recording NodeHasNoDiskPressure event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254799 7822 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node nodes-g86f
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254828 7822 cpu_manager.go:155] [cpumanager] starting with none policy
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254836 7822 cpu_manager.go:156] [cpumanager] reconciling every 10s
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.254845 7822 policy_none.go:42] [cpumanager] none policy: Start
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.255596 7822 manager.go:201] Starting Device Plugin manager
Feb 18 20:42:15 nodes-g86f kubelet[7822]: Starting Device Plugin manager
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.255841 7822 manager.go:237] Serving device plugin registration server on "/var/lib/kubelet/device-plugins/kubelet.sock"
Feb 18 20:42:15 nodes-g86f kubelet[7822]: E0218 20:42:15.255931 7822 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "nodes-g86f" not found
Feb 18 20:42:15 nodes-g86f kubelet[7822]: I0218 20:42:15.256317 7822 container_manager_linux.go:428] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Feb 18 20:42:20 nodes-g86f kubelet[7822]: W0218 20:42:20.256959 7822 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 20:42:20 nodes-g86f kubelet[7822]: E0218 20:42:20.257260 7822 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 18 20:42:25 nodes-g86f kubelet[7822]: I0218 20:42:25.079657 7822 cloud_request_manager.go:89] Requesting node addresses from cloud provider for node "nodes-g86f"
Feb 18 20:42:25 nodes-g86f kubelet[7822]: I0218 20:42:25.082336 7822 cloud_request_manager.go:108] Node addresses from cloud provider for node "nodes-g86f" collected
Feb 18 20:42:25 nodes-g86f kubelet[7822]: E0218 20:42:25.256077 7822 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "nodes-g86f" not found
Feb 18 20:42:25 nodes-g86f kubelet[7822]: W0218 20:42:25.258214 7822 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 20:42:25 nodes-g86f kubelet[7822]: E0218 20:42:25.258673 7822 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 18 20:42:30 nodes-g86f kubelet[7822]: W0218 20:42:30.259980 7822 cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d/
Feb 18 20:42:30 nodes-g86f kubelet[7822]: E0218 20:42:30.260564 7822 kubelet.go:2106] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Masters are up though
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-us-east1-b-bph2 Ready master 22m v1.11.6
master-us-east1-c-2ww4 Ready master 22m v1.11.6
master-us-east1-d-hw6r Ready master 22m v1.11.6
Flannel "seems" to be up
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
dns-controller-85ffbb4fb-tm4ff 0/1 Pending 0 23m
etcd-server-events-master-us-east1-b-bph2 1/1 Running 1 22m
etcd-server-events-master-us-east1-c-2ww4 1/1 Running 0 22m
etcd-server-events-master-us-east1-d-hw6r 1/1 Running 0 22m
etcd-server-master-us-east1-b-bph2 1/1 Running 1 23m
etcd-server-master-us-east1-c-2ww4 1/1 Running 0 22m
etcd-server-master-us-east1-d-hw6r 1/1 Running 0 23m
kube-apiserver-master-us-east1-b-bph2 1/1 Running 2 22m
kube-apiserver-master-us-east1-c-2ww4 1/1 Running 0 22m
kube-apiserver-master-us-east1-d-hw6r 1/1 Running 0 22m
kube-controller-manager-master-us-east1-b-bph2 1/1 Running 0 22m
kube-controller-manager-master-us-east1-c-2ww4 1/1 Running 0 22m
kube-controller-manager-master-us-east1-d-hw6r 1/1 Running 0 23m
kube-dns-6b4f4b544c-4sztd 0/3 Pending 0 23m
kube-dns-autoscaler-6b658bd4d5-gtfcp 0/1 Pending 0 23m
kube-flannel-ds-g6hqp 1/1 Running 0 23m
kube-flannel-ds-glfhq 1/1 Running 0 23m
kube-flannel-ds-hlzxd 1/1 Running 0 23m
kube-proxy-master-us-east1-b-bph2 1/1 Running 0 22m
kube-proxy-master-us-east1-c-2ww4 1/1 Running 0 22m
kube-proxy-master-us-east1-d-hw6r 1/1 Running 0 23m
kube-scheduler-master-us-east1-b-bph2 1/1 Running 0 23m
kube-scheduler-master-us-east1-c-2ww4 1/1 Running 0 22m
kube-scheduler-master-us-east1-d-hw6r 1/1 Running 0 23m
I ask because I fixed the reported issue in Weave Net at https://github.com/weaveworks/weave/pull/3307 it would be interesting to know whether this lets kops work.
@bboreham
I'll test it with the weave CNI plugin tomorrow to test
So using weave
works.
For reference, this is what I ran
kops create cluster \
--node-count 3 \
--zones us-east1-b,us-east1-c,us-east1-d \
--master-zones us-east1-b,us-east1-c,us-east1-d \
--dns-zone k8s.example.com \
--node-size n1-standard-2 \
--master-size n1-standard-2 \
--networking weave \
--project $(gcloud config get-value project) \
--ssh-public-key ~/.ssh/id_rsa.pub \
--state gs://example-obj-store/ \
--api-loadbalancer-type public \
--image "ubuntu-os-cloud/ubuntu-1604-xenial-v20170202" \
k8s.example.com
Everything came up fine.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dns-controller-85ffbb4fb-4b5wg 1/1 Running 0 13m
kube-system etcd-server-events-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system etcd-server-events-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system etcd-server-events-master-us-east1-d-0crt 1/1 Running 0 13m
kube-system etcd-server-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system etcd-server-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system etcd-server-master-us-east1-d-0crt 1/1 Running 0 13m
kube-system kube-apiserver-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system kube-apiserver-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system kube-apiserver-master-us-east1-d-0crt 1/1 Running 0 12m
kube-system kube-controller-manager-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system kube-controller-manager-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system kube-controller-manager-master-us-east1-d-0crt 1/1 Running 0 13m
kube-system kube-dns-6b4f4b544c-sk8sf 3/3 Running 0 10m
kube-system kube-dns-6b4f4b544c-xdflk 3/3 Running 0 13m
kube-system kube-dns-autoscaler-6b658bd4d5-rpsl2 1/1 Running 0 13m
kube-system kube-proxy-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system kube-proxy-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system kube-proxy-master-us-east1-d-0crt 1/1 Running 0 12m
kube-system kube-proxy-nodes-3qjf 1/1 Running 0 9m
kube-system kube-proxy-nodes-fj6r 1/1 Running 0 10m
kube-system kube-proxy-nodes-l3f6 1/1 Running 0 10m
kube-system kube-scheduler-master-us-east1-b-7d52 1/1 Running 0 12m
kube-system kube-scheduler-master-us-east1-c-k5pw 1/1 Running 0 12m
kube-system kube-scheduler-master-us-east1-d-0crt 1/1 Running 0 13m
kube-system weave-net-6pbbl 2/2 Running 0 10m
kube-system weave-net-c6ncv 2/2 Running 0 13m
kube-system weave-net-pw4x7 2/2 Running 0 13m
kube-system weave-net-qgbv4 2/2 Running 0 11m
kube-system weave-net-rf7pm 2/2 Running 0 13m
kube-system weave-net-s6bln 2/2 Running 1 11m
I didn't have to "hack" the node status as it was all fine
$ curl http://localhost:8080/api/v1/nodes/nodes-fj6r/status
{
"kind": "Node",
"apiVersion": "v1",
"metadata": {
"name": "nodes-fj6r",
"selfLink": "/api/v1/nodes/nodes-fj6r/status",
"uid": "7708602d-344f-11e9-81e4-42010a8e001d",
"resourceVersion": "2094",
"creationTimestamp": "2019-02-19T14:05:51Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/instance-type": "n1-standard-2",
"beta.kubernetes.io/os": "linux",
"failure-domain.beta.kubernetes.io/region": "us-east1",
"failure-domain.beta.kubernetes.io/zone": "us-east1-b",
"kops.k8s.io/instancegroup": "nodes",
"kubernetes.io/hostname": "nodes-fj6r",
"kubernetes.io/role": "node",
"node-role.kubernetes.io/node": ""
},
"annotations": {
"node.alpha.kubernetes.io/ttl": "0",
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
}
},
"spec": {
"podCIDR": "100.96.4.0/24",
"providerID": "gce://kops-chx/us-east1-b/nodes-fj6r"
},
"status": {
"capacity": {
"cpu": "2",
"ephemeral-storage": "130046416Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "7659276Ki",
"pods": "110"
},
"allocatable": {
"cpu": "2",
"ephemeral-storage": "119850776788",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "7556876Ki",
"pods": "110"
},
"conditions": [
{
"type": "NetworkUnavailable",
"status": "False",
"lastHeartbeatTime": "2019-02-19T14:05:57Z",
"lastTransitionTime": "2019-02-19T14:05:57Z",
"reason": "WeaveIsUp",
"message": "Weave pod has set this"
},
{
"type": "OutOfDisk",
"status": "False",
"lastHeartbeatTime": "2019-02-19T14:17:33Z",
"lastTransitionTime": "2019-02-19T14:05:51Z",
"reason": "KubeletHasSufficientDisk",
"message": "kubelet has sufficient disk space available"
},
{
"type": "MemoryPressure",
"status": "False",
"lastHeartbeatTime": "2019-02-19T14:17:33Z",
"lastTransitionTime": "2019-02-19T14:05:51Z",
"reason": "KubeletHasSufficientMemory",
"message": "kubelet has sufficient memory available"
},
{
"type": "DiskPressure",
"status": "False",
"lastHeartbeatTime": "2019-02-19T14:17:33Z",
"lastTransitionTime": "2019-02-19T14:05:51Z",
"reason": "KubeletHasNoDiskPressure",
"message": "kubelet has no disk pressure"
},
{
"type": "PIDPressure",
"status": "False",
"lastHeartbeatTime": "2019-02-19T14:17:33Z",
"lastTransitionTime": "2019-02-19T14:05:51Z",
"reason": "KubeletHasSufficientPID",
"message": "kubelet has sufficient PID available"
},
{
"type": "Ready",
"status": "True",
"lastHeartbeatTime": "2019-02-19T14:17:33Z",
"lastTransitionTime": "2019-02-19T14:06:11Z",
"reason": "KubeletReady",
"message": "kubelet is posting ready status. AppArmor enabled"
}
],
"addresses": [
{
"type": "InternalIP",
"address": "10.142.0.30"
},
{
"type": "ExternalIP",
"address": "34.73.1.224"
},
{
"type": "Hostname",
"address": "nodes-fj6r"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"nodeInfo": {
"machineID": "29fcd7edc451f2f25a62066bc395b5e8",
"systemUUID": "29FCD7ED-C451-F2F2-5A62-066BC395B5E8",
"bootID": "2dd7efc4-9d95-46a5-a02f-7762afdafe9c",
"kernelVersion": "4.4.0-62-generic",
"osImage": "Ubuntu 16.04.1 LTS",
"containerRuntimeVersion": "docker://17.3.2",
"kubeletVersion": "v1.11.6",
"kubeProxyVersion": "v1.11.6",
"operatingSystem": "linux",
"architecture": "amd64"
},
"images": [
{
"names": [
"protokube:1.11.0"
],
"sizeBytes": 282689309
},
{
"names": [
"weaveworks/weave-kube@sha256:f1b6edd296cf0b7e806b1a1a1f121c1e8095852a4129edd08401fe2e7aab652d",
"weaveworks/weave-kube:2.5.0"
],
"sizeBytes": 148083959
},
{
"names": [
"k8s.gcr.io/kube-proxy@sha256:de320f2611b72465371292c87d892e64b01bf5e27b211b9e8969a239d0f2523a",
"k8s.gcr.io/kube-proxy:v1.11.6"
],
"sizeBytes": 98120519
},
{
"names": [
"weaveworks/weave-npc@sha256:5bc9e4241eb0e972d3766864b2aca085660638b9d596d4fe761096db46a8c60b",
"weaveworks/weave-npc:2.5.0"
],
"sizeBytes": 49506380
},
{
"names": [
"k8s.gcr.io/pause-amd64@sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516",
"k8s.gcr.io/pause-amd64:3.0"
],
"sizeBytes": 746888
}
]
}
}
Only issue is that the public DNS never gets updated from the temp IP. I had to go and manually change it. However this is a separate issue that looks like it's being tracked here it seems.
Thanks @bboreham! Looks like the weave plugin works!
I do have to edit the firewall rules as @Dirbaio described as well
This is an upstream issue, I am opening it here in order to track.
cc: @thockin @justinsb