Open mborsz opened 1 year ago
Hey, thanks for the info! The first suspect is the test itself that runs on a worker node. The easiest way would be to increase the number of CPU for the test. Especially that nothing else is running on that machine. The second approach would be to monitor CPU and RAM usage from within the test itself. It would be more time-consuming and would likely require correlating CPU usage with the machine itself.
I think I will start with the first approach.
What is interesting is that at some point we increased the number of the test replicas to 2
(https://github.com/kubernetes/perf-tests/pull/2281).
This change was reflected in CPU and RAM usage but not so much for the latency (screenshots).
/cc
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I was reviewing performance of kube-apiserver in watchlist-off tests and found that the LIST latency in tests https://k8s-testgrid.appspot.com/sig-scalability-experiments#watchlist-off is around 40s:
This is quite high for ~290M of data. From experience we usually observe 20-30MiB/s throughput on write due to compression, which should translate to more like ~10-15s latency.
I suspect we are lacking CPU or egress either on master or node side.
I'm not sure how important is it, but I'm afraid it may affect some benchmarks we are running.
/assign @p0lyn0mial
/cc @serathius