projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

K8s pod/container gets deleted. CNI interface torn down but DHCPDISCOVER on interface still executing on host #2309

Closed tpstaples closed 5 years ago

tpstaples commented 5 years ago

On the Minion hosts i'm seeing the following error in /var/log/messages Nov 4 17:22:22 ip-10-233-39-15 dhclient[19495]: DHCPDISCOVER on cali1ce558bd779 to 255.255.255.255 port 67 interval 9 (xid=0x542579b7) Nov 4 17:22:22 ip-10-233-39-15 dhclient[19495]: send_packet: No such device or address Nov 4 17:22:22 ip-10-233-39-15 dhclient[19495]: dhclient.c:2416: Failed to send 300 byte long packet over cali1ce558bd779 interface.

Looking further up in the logs I found that the cali1ce558bd779 interface was torn down by kubelet because the kube-state-metrics pod was deleted.

Nov 4 17:22:07 ip-10-233-39-15 kubelet: 2018-11-04 12:22:07.567 [INFO][22944] k8s.go 361: Endpoint deletion will be handled by Kubernetes deletion of the Pod. ContainerID="a94488714a0928227d7e3b942ab5d930b84ab1fd50322536c465b44c7c39f359" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"ip--10--233--39--15.ec2.internal-k8s-kube--state--metrics--698867bf74--hnmq7-eth0", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"2f9fac3c-e052-11e8-b416-0eb7d1644f80", ResourceVersion:"22813953", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63676947222, loc:(*time.Location)(0x1ee6780)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"k8s-app":"kube-state-metrics", "pod-template-hash":"2544236930", "projectcalico.org/namespace":"kube-system", "projectcalico.org/orchestrator":"k8s"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"ip-10-233-39-15.ec2.internal", ContainerID:"", Pod:"kube-state-metrics-698867bf74-hnmq7", Endpoint:"eth0", IPNetworks:[]string{"10.2.1.3/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.kube-system"}, InterfaceName:"cali1ce558bd779", MAC:"", Ports:[]v3.EndpointPort{v3.EndpointPort{Name:"http-metrics", Protocol:numorstring.Protocol{Type:1, NumVal:0x0, StrVal:"TCP"}, Port:0x1f90}}}} Nov 4 17:22:07 ip-10-233-39-15 kubelet: Calico CNI releasing IP address Nov 4 17:22:07 ip-10-233-39-15 kubelet: 2018-11-04 12:22:07.568 [INFO][22944] utils.go 123: Using a dummy podCidr to release the IP ContainerID="a94488714a0928227d7e3b942ab5d930b84ab1fd50322536c465b44c7c39f359" podCidr="0.0.0.0/0" Nov 4 17:22:07 ip-10-233-39-15 kubelet: Calico CNI deleting device in netns /proc/16239/ns/net Nov 4 17:22:07 ip-10-233-39-15 dhclient[19495]: receive_packet failed on cali1ce558bd779: Network is down Nov 4 17:22:07 ip-10-233-39-15 kubelet: 2018-11-04 12:22:07.588 [INFO][22944] k8s.go 382: Teardown processing complete. ContainerID="a94488714a0928227d7e3b942ab5d930b84ab1fd50322536c465b44c7c39f359"

So for some reason.. the interface was torn down and deleted by kubelet and calico.. but DHCP wasn't informed that it shouldn't try to get DHCP addresses on the interface anymore?

Not sure if this has been fixed in a newer release of calico or not.. but i coudn't find an existing bug on this. Anyone have any ideas?

Expected Behavior

DHCP shouldn't try to get an ip address on a calico interface that was torn down in K8s

Current Behavior

See description above

Possible Solution

Steps to Reproduce (for bugs)

Context

Excessive log messages filling up our logging infrastructure as nodes become older and CNI interfaces are torn down.

Your Environment

Running Kubernetes 1.9.7 in AWS. Running quay.io/calico/cni:v2.0.3 with quay.io/coreos/flannel:v0.10.0 Linux acxauva020147 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux CentOS Linux release 7.5.1804 (Core) Derived from Red Hat Enterprise Linux 7.5 (Source) cat: /etc/lsb-release.d: Is a directory NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.5.1804 (Core) CentOS Linux release 7.5.1804 (Core) cpe:/o:centos:centos:7

caseydavenport commented 5 years ago

DHCP shouldn't try to get an ip address on a calico interface that was torn down in K8s

The expected behavior is actually that DHCP should never get an IP address for Calico interfaces, even when the pod is running.

Calico on k8s doesn't use DHCP for IP address management, so this feels outside the scope of Calico and more down to how DHCP is configured on the node.

Perhaps we need documentation which states DHCP should be disabled for Calico-networked pod interfaces.

tpstaples commented 5 years ago

Interesting... sorry for my naivete on that..
Will investigate deeper on my side.