Closed gaozuo closed 9 months ago
Cluster networking is not namespaced. I suspect that your problem is related to inter-node CNI traffic - as you noted, the problem has more to do with which nodes the pods are running on. Confirm that the correct ports are open on the security groups for the nodes you just added, for whatever flannel backend you're using.
The issue has been resolved. It was evident from all scenarios that the traffic between nodes was abnormal. We later found out that the network team had implemented a blocking strategy without our knowledge. After readjusting the network rules, the cluster is now functioning well.
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Describe the bug:
When we initially set up the cluster, we used 1 master node and 5 agent nodes, which operated smoothly and stably for a while. Later, we added two more agent nodes (mh-hr-app5, mh-hr-app6). Now, when new pods are scheduled on these two nodes, they are unable to access Kafka and Zookeeper services in the Confluent namespace. However, access to HTTP-type services remains normal across all namespaces. This leads us to suspect that the network configuration of these newly added agent nodes might be inconsistent with the network configuration script used during the initial cluster installation.
Steps To Reproduce:
Additional context / logs: