Open lojies opened 1 year ago
/sig api-machinery
/assign /triage accepted
@jiahuif: The label(s) triage/accepged
cannot be applied, because the repository doesn't have them.
/cc @wojtek-t @MikeSpreitzer
@jiahuif: The label(s)
triage/accepged
cannot be applied, because the repository doesn't have them.
/triage accepted
then send a lot of request by tools
Could you elaborate what tools you were using? Were you using Kubernetes client, kubectl
, HTTP benchmarking tool, or (D)DoS?
then send a lot of request by tools
Could you elaborate what tools you were using? Were you using Kubernetes client,
kubectl
, HTTP benchmarking tool, or (D)DoS?
we use HTTP benchmarking tool :vegata
I don't fully understand this issue
you can set --max-mutating-requests-inflight=10 and --max-requests-inflight=10, then send a lot of request by tools.the kube-apiserver's cpu very high and some request maybe not resoponded or overtime.
in order to prevent http ddos attacks you'll have to rate limit, that means that requests are going to be dropped
Can you expand a bit more on what experiments did you do, and based on your results what improvements are you suggesting?
in order to prevent http ddos attacks you'll have to rate limit, that means that requests are going to be dropped
Yes, indeed, many requests are being denied. However, I have limited the requests to 10, so in theory, the API server's CPU usage should not be excessively high. But it seems that, based on what I can see, the API server is still consuming a significant amount of CPU processing these denied requests before they are rejected.
Can you expand a bit more on what experiments did you do, and based on your results what improvements are you suggesting?
I have a cluster with 3 API servers, about 2,000 nodes and 100,000 pods. Many of these pods need to list some resources, including pods, nodes, CR and so on during their startup process. When I tested restarting all 3 API servers, I observed that the CPU usage of the API servers spiked significantly, almost reaching the upper limit, even though I had set a request rate limit of 10. According to my expectations, the CPU usage of the API servers should not have spiked so high, but in reality, it remains quite high.
Subsequently, we conducted performance testing on the API server using both tools and custom programs. We observed that the CPU utilization of the API server remained consistently high in both cases. However, when we increased the retryAfter value, we noticed a significant reduction in CPU usage.
You can write a simple program that concurrently keeps listing resources continuously while setting --max-mutating-requests-inflight=10 and --max-requests-inflight=10, and then observe the CPU changes of the API server.
What version are you using?
based on your comments it seems you are hitting a known scalability problem, that will be solved by https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3157-watch-list , @wojtek-t and @p0lyn0mial are the best persons to judge if this is related
I bet that the problem is opening TCP connections , with https that may consume a lot of resources and is happening before any kube-apiserver logic actually fires.
Assuming this is the case, there is not much we can do in kube-apiserver itself - it would have to be solved either in the golang http server or in some layer in front of it (like load-balancer).
I bet that the problem is opening TCP connections , with https that may consume a lot of resources and is happening before any kube-apiserver logic actually fires.
Assuming this is the case, there is not much we can do in kube-apiserver itself - it would have to be solved either in the golang http server or in some layer in front of it (like load-balancer).
Yes, i think you got what i want to show.we can limit in load-balancer, but now most of the connections are in pods and use the kubernetes service which cannot be limited by load-balancer.
I'm not sure if we can limit this by kube-proxy.
a flame graph may tell the truth.
Yes, I think that CPU and memory profiles from the test could reveal the potential issue.
max-mutating-requests-inflight=10 and --max-requests-inflight=10 default rate limit and restart 3 apiserver
If I am interpreting the first graph correctly, it appears that the CPU spends most of its time in the authentication filter (an HTTP handler) specifically on verifying a certificate signature. This makes sense since this is a cryptographic operation which is CPU-bound.
It seems that the authentication filter is placed before the APF filter. In this case it means that we did some processing just to place the request into a queue. I don't know why the APF filter is placed after the authentication filter. It might be because we require some authentication information, or we don't want to have unauthenticated requests sitting in the queue. Does anyone know why?
Now I can also see that the api server could have some sort of protection even on the L4 layer - before a TLS handshake or even earlier. The question here is whether the community would support integrating such a mechanism into the api server.
I think a serious server setup requires multiple layers of control. If your earliest control is after crypto then your attacker has an easy time. OTOH, control on un-authenticated request attributes is something that is only safe to be done one-sided: you can disfavor based on that but not favor.
Agree with Mike.
Following up on my previous comment above - I think the mechanism that we need to protect against this problem is different:
To squarely answer the question about why the APF filter is after authentication: it is a deliberate choice based on not trusting unauthenticated stuff.
I think that it is legitimate to be concerned about controlling the load on the crypto in authentication. To me, that sounds like a distinct feature from APF, as it would be designed with somewhat different concerns in mind. It inherently is dealing with untrusted stuff. It slides into DOS protection, which is something you want to push as far "out" in front of the server as possible. Got a load balancer in front of your servers? You probably want to do this there. Running on a cloud with DOS protection? You probably want to use that.
What happened?
Does kube-apiserver have internal rate limiting measures, apart from Admission Control (APF) rate limiting which seems to control only the number of simultaneous requests being processed? Now, suppose a large number of requests are being sent to the kube-apiserver by pods accessing Kubernetes services, it can result in very high CPU load on the apiserver, subsequently causing timeouts and issues with processing regular requests as well as health probes.
What did you expect to happen?
The apiserver can limit the quantity of requests, not only those that need processing but all external access, in order to prevent a surge in CPU usage due to high volumes of traffic, which could impact functionality.
How can we reproduce it (as minimally and precisely as possible)?
you can set --max-mutating-requests-inflight=10 and --max-requests-inflight=10, then send a lot of request by tools.the kube-apiserver's cpu very high and some request maybe not resoponded or overtime.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)