Open xing-yang opened 3 years ago
@NickrenREN I wonder if you've seen a similar issue in production.
Node Watcher is a single instance controller, what is the scalability issue ?
@NickrenREN It affects the e2e tests. Details are in this issue: https://github.com/kubernetes/kubernetes/issues/102452
By disabling the external-health-monitor, the failure went away.
IIUC, the root cause of the scalability issue you mention is: Node Watcher watches PVCs, Nodes and Pods ? I just don't understand the reason. k8s default scheduler also does the same thing.
Watch is persistent connection, and Node Watcher is a single instance controller. Is this really the root cause ?
I saw many API Throttlings, so maybe we can decrease the API call frequency ?
Watch is persistent connection, and Node Watcher is a single instance controller. Is this really the root cause ?
This needs more investigation. The observation is that the failure went away when external-health-monitor was disabled, came back again when it is enabled, and went away again when it was disabled.
I saw many API Throttlings, so maybe we can decrease the API call frequency ?
We could try that.
This needs more investigation. The observation is that the failure went away when external-health-monitor was disabled, came back again when it is enabled, and went away again when it was disabled.
This indicates the controller causes the failure (API throttling ?), but i still don't think Watch is the root cause.
This indicates the controller causes the failure (API throttling ?), but i still don't think Watch is the root cause.
The external-health-monitor controller added more load to the API server which might have triggered those failures.
The external-health-monitor controller added more load to the API server which might have triggered those failures.
I agree, so we can try to decrease the API call frequency first.
I would like to work on this issue. Will start to look into it and understand.
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
/reopen
@pohly: Reopened this issue.
/lifecycle frozen
/assign
We have an issue https://github.com/kubernetes-csi/external-health-monitor/issues/75 to change the code to only watch Pods and Nodes when the Node Watcher component is enabled. We still need to address the scalability issue when Node Watcher is enabled:
kubernetes/kubernetes#102452 (comment)