Open serathius opened 3 years ago
Sounds good, We really need to consider and test properly before committing, and so when we plan to start it? After https://github.com/kubernetes-sigs/metrics-server/pull/777 merged?
I don't think this is nesesery blocked by #777 as they are pretty independent changes. I think we can start to do performance testing.
ok, Let me take a look, and try to do it?
How storage would consume partial data, decide to remove old one and performance overhead caused by that
I don't think this is possible with the current storage implementation. To throw a suggestion to work around that, maybe we could have 1 store per node. I don't really see any drawbacks to that, except that getting PodMetrics will become more complex.
How to test code and prevent leaking goroutines
We could check for leaks with: https://github.com/uber-go/goleak
Also, I don't know if metrics-server registers the client_golang go collector, but it could give us insights regarding the number of goroutines that are currently running with the go_goroutines
metrics.
/assign
I am working on this issue recently, I have a few questions
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten /lifecycle frozen
This should implement Redesign goroutine management to reduce scheduling cost (will require redesigning storage)
point from https://github.com/kubernetes-sigs/metrics-server/issues/857
/remove-lifecycle frozen @yangjunmyfm192085 Hi, thanks for your contribution, any update on this issue?
/lifecycle frozen /cc @shuaich
Sorry, I delayed this issue due to lack of time and other things. I modified it at https://github.com/yangjunmyfm192085/metrics-server/tree/rewriting-scraper
but not submit pr. I will submit the pr as soon as possible, and need help me to review whether the modification is reasonable. If there is no problem, I will do a comparison test @serathius.
Of course, @shuaich is also welcome to participate
Metrics Server scraper is responsible for creating paralleling metric collection. Every cycle it lists all the nodes in the cluster and creates goroutine to scrape each node. Each go-routine waits some random time (to avoid scrapping at the same time), collects metrics and then is removed. Problems with this approach:
What would you like to be added:
Instead of listing nodes every cycle and churning goroutines, we should maintain a goroutine per node and use informer event handler to add/remove goroutines as nodes are updated. Similar approach is used by Prometheus Server.
Things to consider:
Why is this needed:
I have listed some problems, but we should test it properly before committing to it. We should create benchmarks and validate scraper and storage performance before merging any code, as in worst case we could result in more complicated code without addressing any of the problems.
/cc @yangjunmyfm192085 @dgrisonnet
/kind feature