projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

Calico doesn't survive a minikube restart #2359

Closed aodj closed 5 years ago

aodj commented 5 years ago

Expected Behavior

Restarting minikube shouldn't break Calico

Current Behavior

After getting the stars demo working, and stop/starting the minikube cluster, Calico starts but doesn't prevent pod to pod communication

Possible Solution

Steps to Reproduce (for bugs)

  1. Setup minikube cluster like so:
    # create k8s cluster
    minikube start --cpus 4 --memory 4096  --extra-config=kubelet.network-plugin=cni --network-plugin=cni
    # install calico-etcd
    kubectl apply -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/etcd.yaml
    # install calico
    curl https://docs.projectcalico.org/v3.4/getting-started/kubernetes/installation/hosted/calico.yaml -O
    sed -i '.bak' "s/10\.96\.232\.136/$(kubectl get service -o json --namespace=kube-system calico-etcd | jq  -r .spec.clusterIP)/" calico.yaml
    kubectl apply -f calico.yaml
    # install stars demo components
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/manifests/00-namespace.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/manifests/01-management-ui.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/manifests/02-backend.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/manifests/03-frontend.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/manifests/04-client.yaml
    # install networkpolicy rules
    kubectl create -n stars -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/default-deny.yaml
    kubectl create -n client -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/default-deny.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/allow-ui.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/allow-ui-client.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/backend-policy.yaml
    kubectl create -f https://docs.projectcalico.org/v3.4/getting-started/kubernetes/tutorials/stars-policy/policies/frontend-policy.yaml
    # open management-ui
    open http://$(minikube ip):$(kubectl get svc --namespace=management-ui -o json | jq -r .items[0].spec.ports[0].nodePort)
  2. Once the cluster has settled down, the management-ui should show the restricted communications between pods, as the stars demo explains
  3. Stop minikube: minikube stop
  4. Restart minikube: minikube start 4a. Note that running the same minikube start --cpus 4 --memory 4096 --extra-config=kubelet.network-plugin=cni --network-plugin=cni doesn't make any difference
  5. Wait for the pods to start up, and open the management-ui again: open http://$(minikube ip):$(kubectl get svc --namespace=management-ui -o json | jq -r .items[0].spec.ports[0].nodePort)
  6. Notice that all the pods can communicate with each other

Context

This shouldn't happen, and I can't work out how to even begin debugging Calico properly to provide more information here. If anyone can provide guidelines here that would be great, because I couldn't work out what I should be doing after setting up calicoctl (as a k8s pod).

Your Environment

aodj commented 5 years ago

I should elaborate and say that my definition of "calico not working" is that the management-ui shows all ports open between the three pods (client, frontend and backend).

When I stream the logs from the management-ui pod, and run it through jq I can see that the connection state bounces between reachable and unreachable:

$ kubectl logs management-ui-mgfbp --namespace=management-ui --follow=true | grep stdout | awk '{print $6}' | jq -R --stream '. as $line | try fromjson catch $line' | jq -c '.status.targets|sort_by(.url)|reverse'
[{"reachable":false,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":false,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":false,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":false,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":false,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":false,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":false,"url":"http://backend.stars:6379/status"}]
[{"reachable":false,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":true,"url":"http://backend.stars:6379/status"}]
[{"reachable":true,"url":"http://frontend.stars:80/status"},{"reachable":true,"url":"http://client.client:9000/status"},{"reachable":false,"url":"http://backend.stars:6379/status"}]
^C

If you look at the reachable status of the frontend.stars pod, you can see it bounces between true and false

caseydavenport commented 5 years ago

I'm not familiar with what minikube stop / start will actually do, but I'd check firstly that the policies you configured are still present in etcd and that they haven't been removed somehow.

e.g. calicoctl get networkpolicy --namespace stars

aodj commented 5 years ago
$ kubectl get netpol --all-namespaces
NAMESPACE   NAME              POD-SELECTOR    AGE
client      allow-ui          <none>          10h
client      default-deny      <none>          10h
stars       allow-ui          <none>          10h
stars       backend-policy    role=backend    10h
stars       default-deny      <none>          10h
stars       frontend-policy   role=frontend   10h
$ kubectl exec -ti -n kube-system calicoctl -- /calicoctl get networkpolicy --all-namespaces
NAMESPACE   NAME
client      knp.default.allow-ui
client      knp.default.default-deny
stars       knp.default.allow-ui
stars       knp.default.backend-policy
stars       knp.default.default-deny
stars       knp.default.frontend-policy
caseydavenport commented 5 years ago

@aodj are all the Calico pods running successfully? Do you see any errors in the Calico logs?

If there are no errors, could you check the output of iptables save -c on one of the nodes and include it here?

caseydavenport commented 5 years ago

The other thing to check would be if Calico CNI is still being used after the restart - this sounds a lot like minikube started using something else (e.g., kubenet).