Closed Vasiliy-Basov closed 8 months ago
I'm a little confused. Does the following error always exist?
E0111 12:19:15.534824 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.101.3.75:10250/metrics/resource\": dial tcp 10.101.3.75:10250: connect : connection refused" node="kubws-vt01"
I0111 12:19:20.209395 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
If the error persists, use the kubectl top nodes
command. There should be no following information.
kubws-vt01 231m 5% 8388Mi 70%
If you have obtained the metrics of node kubws-vt01
usingkubectl top nodes
, it means that metrics-server is working normally.
The error occurs not always but intermittently. I have multiple network adapters on the servers, and for some unknown reason, it sometimes tries to fetch metrics using the wrong adapter from another subnet. The question is, where does it retrieve these settings? Could the issue be related to kube-vip? Is it possible to specify a particular adapter for metric retrieval? Here is the log from the metric pod:
2024-01-22T11:48:43+03:00 I0122 08:48:43.894769 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
2024-01-22T11:48:53+03:00 I0122 08:48:53.894926 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
2024-01-22T11:49:03+03:00 I0122 08:49:03.894927 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
2024-01-22T11:49:03+03:00 I0122 08:49:03.900175 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
2024-01-22T11:53:34+03:00 E0122 08:53:34.841034 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.101.3.74:10250/metrics/resource\": dial tcp 10.101.3.74:10250: connect: connection refused" node="kubms-vt01"
2024-01-22T11:53:34+03:00 E0122 08:53:34.853509 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.101.3.76:10250/metrics/resource\": dial tcp 10.101.3.76:10250: connect: connection refused" node="kubms-vt02"
2024-01-22T16:08:49+03:00 E0122 13:08:49.844596 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.101.3.74:10250/metrics/resource\": dial tcp 10.101.3.74:10250: connect: connection refused" node="kubms-vt01"
2024-01-22T16:13:49+03:00 E0122 13:13:49.850523 1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.101.3.75:10250/metrics/resource\": dial tcp 10.101.3.75:10250: connect: connection refused" node="kubws-vt01"
/triage accepted /assign @yangjunmyfm192085
"I suspect that the problem is related to Calico, probably with this daemonset setting:
- name: NODEIP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: IP_AUTODETECTION_METHOD
value: can-reach=$(NODEIP)
- name: IP
value: autodetect
calicoctl get nodes -o wide
NAME ASN IPV4 IPV6
kubms-vt01 (64512) 172.18.27.52/23
kubms-vt02 (64512) 10.101.3.76/24
kubws-vt01 (64512) 172.18.27.55/23
After reconfiguring the network on the nodes via netplan, disabling adapters from the 10th subnet, and restarting Calico pods, the errors are no longer present."
"I suspect that the problem is related to Calico, probably with this daemonset setting:
- name: NODEIP valueFrom: fieldRef: apiVersion: v1 fieldPath: status.hostIP - name: IP_AUTODETECTION_METHOD value: can-reach=$(NODEIP) - name: IP value: autodetect
calicoctl get nodes -o wide NAME ASN IPV4 IPV6 kubms-vt01 (64512) 172.18.27.52/23 kubms-vt02 (64512) 10.101.3.76/24 kubws-vt01 (64512) 172.18.27.55/23
After reconfiguring the network on the nodes via netplan, disabling adapters from the 10th subnet, and restarting Calico pods, the errors are no longer present."
Does this display occasionally appear when executing the kubectl get nodes -o wide
command?
When running kubectl get nodes -o wide
, I didn't see such output. INTERNAL-IP addresses were in the correct subnet.
The metrics server is experiencing errors
Warnings appear periodically.
It seems like the metrics server periodically attempts to fetch metrics through a different network adapter with different ip. How can I change this behavior and make it use the desired network adapter?
Deployment config:
I would like to understand why the metrics server is obtaining the incorrect IP address (address from another network adapter) even though the INTERNAL-IP is configured with the correct address.
I tried changing the --kubelet-preferred-address-types option to different values, but without success. Calico CNI is used
/kind support