Open wojtek-t opened 1 year ago
The only suspicious one that I see in our preset is this one:
- name: KUBE_GCE_PRIVATE_CLUSTER
value: "true"
containerd logs from master:
May 12 08:43:21.379251 bootstrap-e2e-master containerd[650]: time="2023-05-12T08:43:21.379201176Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
on nodes we get cni config from template: NetworkPluginConfTemplate:/home/kubernetes/cni.template
on the master it is empty. In logs from master I can see that setup-containerd is called from configure-helper and it should set the template path. My guess would be that https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L3181 is executed, but this should not be the case.
I have sshed on to the master and it looks like all configuration files regarding cni are in place. Kubectl describe node on master:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RegisteredNode 21m node-controller Node bootstrap-e2e-master event: Registered Node bootstrap-e2e-master in Controller
Normal CIDRAssignmentFailed 26s (x56 over 21m) cidrAllocator Node bootstrap-e2e-master status is now: CIDRAssignmentFailed
Kube controller manager logs:
E0512 13:12:32.119653 11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"]
E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
E0512 13:12:32.119682 11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master"
I0512 13:12:32.119755 11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"
Wojtek's gut feeling was right.
@p0lyn0mial if you want to we can create pr to add:
- --env=KUBE_GCE_PRIVATE_CLUSTER=false
to the tests and they should work just fine. In the meantime I will try to understand why KUBE_GCE_PRIVATE_CLUSTER makes master node to get two CIDRs.
Does it have cloud NAT enabled?
If not the private network may be having issues fetching eg from registry.k8s.io which isn't a first-party GCP service unlike GCR
cc @aojea re: GCE cidr allocation :-)
E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
https://github.com/kubernetes/test-infra/issues/29500#issuecomment-1545732863 @basantsa1989 we have a bug in the allocator https://github.com/kubernetes/kubernetes/commit/a013c6a2db54c59b78de974b181586723e088246
If we receive multiple cidrs before patching for dual-stack we should validate that those are dual stack
We have to fix it in k/k and in the cloud-provider-gcp https://github.com/kubernetes/cloud-provider-gcp/blob/67d1fd9f7255629fac3adfc956d0c8b2ac5f50f0/pkg/controller/nodeipam/ipam/cloud_cidr_allocator.go#L341-L344
FYI: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/util.sh#L3008 this is the place where we add master internal ip as a second alias if we are using KUBE_GCE_PRIVATE_CLUSTER
Then this second ip is picked by kcm (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/gce/gce_instances.go#L496) and allocator thinks we have dual stack and tries to apply both of them which fails, because we can have at most one ipv4 cidr per node.
Kube controller manager logs:
E0512 13:12:32.119653 11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"] E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master" E0512 13:12:32.119682 11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master" I0512 13:12:32.119755 11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"
@Argh4k do you have the entire logs?
@aojea https://gcsweb.k8s.io/gcs/sig-scalability-logs/ci-kubernetes-e2e-gci-gce-scalability-watch-list-off/1658029086385115136/bootstrap-e2e-master/ has all the logs from the master
/sig network
based on @basantsa1989 comment https://github.com/kubernetes/kubernetes/pull/118043#issuecomment-1553661135 the allocator is working as expected and the problem is that this is not supported
can we configure the cluster in a different way we don't pass two cidrs?
I hope we can, unfortunately I haven't had much time to look into this and other work was unblocked by running tests in a small public cluster.
@Argh4k Hey, a friendly remainder to work on this issue :)
It looks like having a private cluster would increase egress traffic. Having a higher egress bandwidth would allow us to generate a larger test traffic. Currently, we had to reduce the test traffic because it seems that latency is being throttled due to the limited egress bandwidth.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think that this issue still hasn't been resolved
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
I think that this issue still hasn't been resolved
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@aojea thoughts on this?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
In scalability tests, the control-plane node is never initialized to be ready. We're usually not suffering from them as almost all our tests run 100+ nodes and we tollerate 1% of nodes not initialized correctly. But this is problematic for tests like: https://testgrid.k8s.io/sig-scalability-experiments#watchlist-off
Looking into kubelet logs, the reason seem to be:
FWIW - it seems to be related to some of our preset settings, as, e.g. https://testgrid.k8s.io/sig-scalability-node#node-containerd-throughput
don't suffer from it.
@kubernetes/sig-scalability @mborsz @Argh4k @p0lyn0mial - FYI