projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

Delay between Calico pick IP address and assign address to pod #5612

Closed hmlkao closed 2 years ago

hmlkao commented 2 years ago

Expected Behavior

When I scale up deployment from 1 to 100 replicas which are all scheduled to one node, it takes about 2-3 minutes then all of them obtain IP address. I expect that IPs will be assigned much faster.

Current Behavior

The time between the scheduling the pod to node and assigning of IP address to this pod increases according to the amount of pods scheduled to this node. In other words when I set 10 replicas addresses will be assigned much faster then I set 100 replicas.

Possible Solution

I'm not sure where exactly the delay occurs, see the Example below.

I've tried:

Steps to Reproduce (for bugs)

  1. Cordon all workers except one
  2. Scale up deployment of any app from 1 to 100 replicas
  3. Check time when the last scheduled pod obtain IP address

Context

When dev team release new app version and many of pods are scheduled to one of workers there is significant delay between pod is really Running. We are running K8s on powerful bare metal workers (about 100 vCPUs and 512 GB RAM) using Calico as CNI.

Your Environment

Example timeline of issue

Test done on one "clean" worker in our lab environment without any other workload, logs were take from console, consul-connect-inject-init container within an app pod and calico-node pod

Full logs are in files consul-init.log and calico-node.log.

If you need any other details, let me know.

It would be great to know where delay occurs and what is bottle neck of this behavior.

caseydavenport commented 2 years ago

@hmlkao a few things to point out:

What makes you think that Calico is at fault here? It very well might be, but from what I can tell the pod goes into Init state at 11:12:13, and then calico/node learns about the IP 11:12:18, only 5 seconds later. That suggests that the process of launching a container, allocating an IP, configuring routes and sysctls, and communicating the IP to calico/node takes 5 seconds which is pretty reasonable.

12:13:22 - IP address assigned to pod

How are you measuring this? From the log, it appears the IP is assigned well before that (at 11:12:18)

If I had to guess, what you are seeing is that Calico is allocating the IP addresses immediately, but that kubelet is not reporting the IP back to the API until some time later due to batching / rate-limiting mentioned above. This is the original issue about that, which wasn't deemed serious enough to fix: https://github.com/kubernetes/kubernetes/issues/39113

hmlkao commented 2 years ago

@caseydavenport , thanks for your response and notes.

What I was tough is that the delay is created between Calico allocate an IP address from range and writing it to "somewhere" (I don't know how it is processed) on node. But as you mentioned the delay can be caused by kubelet before it report it back to API. However, pod is able to create connections to network after the IP address is reported which should be IMO independent from reporting IP address to API (I suppose that the connection should be available right after the interface is created). It's little bit confusing for me.

Maybe I could check interfaces created on node during a test (which I didn't). Or can you advise me what would I should look for?

How are you measuring this?

"Measured" as simply as I could by refreshing the kubectl in terminal

Thanks for the link to issue, I'll try some tests with modified kubelet parameters.

hmlkao commented 2 years ago

Unfortunately, changing of kubelet parameters (kube-api-qps, kube-api-burst, event-qps and event-burst) doesn't help. The delay is still the same.

caseydavenport commented 2 years ago

@hmlkao as discovered on the original issue, those parameters do help somewhat in extreme circumstances but they are not the only problem here. Those are API client ratelimiting parameters, whereas this is a kubelet specific piece of batching code which further limits this, which I do not believe to be configurable.

"Measured" as simply as I could by refreshing the kubectl in terminal

The vast majority of the time before this shows up in kubectl is likely to be outside of Calico's control. Based on the logs you provided Calico appears to be assigning an address and returning it to Kubernetes quickly, so I don't think this is a Calico issue, rather a problem elsehwere in the system.

pod is able to create connections to network after the IP address is reported which should be IMO independent from reporting IP address to API

I would expect the pod to have network access very shortly after the IP is assigned to the interface, which will happen before it is reported in the API. However, note that some network functions (like services) require the IP to appear in the API in order to function because the IP needs to be known on remote nodes as well, so not all network access will succeed until the IP does appear in the API.

Or can you advise me what would I should look for?

12:12:32 - connect-inject-init container start looking for service

Could you explain what this init container does? Is this an init container in the pod that is launching?

hmlkao commented 2 years ago

Sorry for the delay, I want to give you a comprehensive answer so I'd to read some more information.

Could you explain what this init container does? Is this an init container in the pod that is launching?

This init container is part of Consul Connect (service mesh implementation by Consul), container is injected to pod by Consul mutating webhook, this container is trying to open connection to Consul Agent daemonset on node (via HOST_IP and nodePort ), it is looking for associated service registered in Consul catalog to prepare config for Envoy sidecar running beside app container.

There is K8s deployment and pod of app: deployment.yaml.txt, pod.yaml.txt

I thought that it could be a problem with overloaded Consul Agent so I did a test with app without this init container and the behavior is similar. So it isn't caused by Consul at all.

I've done more tests with check when is container created on node and looked at tons of logs and I'm more and more convinced that the problem is outside of Calico scope. So thank you very much for cooperation, I'll try to investigate elsewhere.

caseydavenport commented 2 years ago

Ok cool, I'm going to close this for now but if you do find more evidence that this is related to Calico just shout and we can reopen.