Open huangyutongs opened 9 months ago
Hi, in order to understand this issue better, can you answer a few questions before?
resourcebinding
continuously displayed abnormally, or does it display normally after a period of time?karmada-descheduler
. Since there is no date, I am not sure whether it is related to this phenomenon. Can you describe the specific date of this log?karmada-controller-manager
near where the phenomenon occurs? Because the state is its responsibility. It should be noted that it has two instances, and its leader instance can be found through the command kubectl get Lease -A
Hi, in order to understand this issue better, can you answer a few questions before?
- Is the status of
resourcebinding
continuously displayed abnormally, or does it display normally after a period of time?- You showed the log of component
karmada-descheduler
. Since there is no date, I am not sure whether it is related to this phenomenon. Can you describe the specific date of this log?- Can you show the logs of
karmada-controller-manager
near where the phenomenon occurs? Because the state is its responsibility. It should be noted that it has two instances, and its leader instance can be found through the commandkubectl get Lease -A
Thanks for your reply, answer you now,
I1227 03:41:43.103307 1 recorder.go:104] "events: Update resourceBinding(scmp-a/rps-yhportal-management-deployment) with AggregatedStatus successfully." type="Normal" object={Kind:ResourceBinding Namespace:scmp -a Name:rps-yhportal-management-deployment UID:4eb13939-59df-4b4d-988e-8c752d814172 APIVersion:work.karmada.io/v1alpha2 ResourceVersion:959002 FieldPath:} reason="AggregateStatusSucceed"
I1227 03:41:43.103335 1 recorder.go:104] "events: Update resourceBinding(scmp-a/rps-yhportal-management-deployment) with AggregatedStatus successfully." type="Normal" object={Kind:Deployment Namespace:scmp -a Name:rps-yhportal-management UID:aa48cdf6-098e-4a22-88a1-210ddc39dc55 APIVersion:apps/v1 ResourceVersion:958987 FieldPath:} reason="AggregateStatusSucceed"
I1227 10:05:49.109651 1 request.go:696] Waited for 1.022718291s due to client-side throttling, not priority and fairness, request: GET:https://192.168.120.50:6443/apis/snapshot.storage.k8s .io/v1
W1227 10:05:49.811792 1 cluster_status_controller.go:237] Maybe get partial(67) APIs installed in Cluster idc-hyper-sit-1. Error: unable to retrieve the complete list of server APIs: acme.yourcompany.com/v1alpha1 : the server is currently unable to handle the request, metrics.k8s.io/v1beta1: the server is currently unable to handle the request.
I1227 10:05:51.230784 1 request.go:696] Waited for 1.022560257s due to client-side throttling, not priority and fairness, request: GET:https://192.168.120.50:6443/apis/application.kubesphere.io /v1alpha1
W1227 10:05:51.932665 1 cluster_status_controller.go:237] Maybe get partial(67) APIs installed in Cluster idc-hyper-sit-1. Error: unable to retrieve the complete list of server APIs: acme.yourcompany.com/v1alpha1 : the server is currently unable to handle the request, metrics.k8s.io/v1beta1: the server is currently unable to handle the request.
I1227 10:06:01.230738 1 request.go:696] Waited for 1.020999528s due to client-side throttling, not priority and fairness, request: GET:https://192.168.120.50:6443/apis/fluentd.fluent.io /v1alpha1
W1227 10:06:01.935454 1 cluster_status_controller.go:237] Maybe get partial(67) APIs installed in Cluster idc-hyper-sit-1. Error: unable to retrieve the complete list of server APIs: acme.yourcompany.com/v1alpha1 : the server is currently unable to handle the request, metrics.k8s.io/v1beta1: the server is currently unable to handle the request.
I1227 10:06:07.390602 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390636 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390781 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390953 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390973 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390954 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.391154 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I1227 10:06:07.390580 1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
E1227 10:06:11.938697 1 cluster_status_controller.go:406] Failed to do cluster health check for cluster idc-hyper-sit-1, err is: Get "https://192.168.120.50:6443/readyz": dial tcp 192.168 .120.50:6443: connect: connection refused
E1227 10:06:21.940779 1 cluster_status_controller.go:406] Failed to do cluster health check for cluster idc-hyper-sit-1, err is: Get "https://192.168.120.50:6443/readyz": dial tcp 192.168 .120.50:6443: connect: connection refused
More log information is in the attachment karmada-controller-manager.log
Please provide an in-depth description of the question you have: When I release a new version, due to unexpected reasons, the new version of the Pod in the member1 cluster cannot be ready within PropagationPolicy.spec.failover.application.decisionConditions.tolerationSeconds: 120 seconds, triggering a failover operation to transfer the Pod to the member2 cluster. The new version of the pod in the member2 cluster also failed to be ready within 120 seconds. ResourceBinding always displayed message: '0/2 clusters are available: 2 cluster(s) is in the process of eviction.', I don't know PropagationPolicy.spec.failover. Is the application.decisionConditions.tolerationSeconds field like I understand it, I do see pods being deleted and created repeatedly in both clusters, after which they stabilize, but the ResourceBinding status is not right.
resourcebinding SCHEDULED is False
I deploy using helm chart v1.8.1,values.yaml has the following components enabled components: [ "schedulerEstimator", "descheduler", "search" ]
worke Resources are normal
I have two clusters
karmada pod is running normally
karmada-scheduler enables the
--enable-scheduler-estimator=true
functionNotice that karmada-descheduler has the following error message, and other karmada program logs have no obvious error message. kubectl logs -f --tail 30 -n karmada-system -l app=karmada-descheduler
What do you think about this question?: It feels like the controller is not retrying multiple times to check status Environment: