ControlPlane node is not ready in scalability tests when run on GCE

wojtek-t commented 1 year ago

In scalability tests, the control-plane node is never initialized to be ready. We're usually not suffering from them as almost all our tests run 100+ nodes and we tollerate 1% of nodes not initialized correctly. But this is problematic for tests like: https://testgrid.k8s.io/sig-scalability-experiments#watchlist-off

Looking into kubelet logs, the reason seem to be:

May 11 09:09:13.886270 bootstrap-e2e-master kubelet[2782]: E0511 09:09:13.886233    2782 kubelet.go:2753] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

FWIW - it seems to be related to some of our preset settings, as, e.g. https://testgrid.k8s.io/sig-scalability-node#node-containerd-throughput

don't suffer from it.

@kubernetes/sig-scalability @mborsz @Argh4k @p0lyn0mial - FYI

wojtek-t commented 1 year ago

The only suspicious one that I see in our preset is this one:

  - name: KUBE_GCE_PRIVATE_CLUSTER
    value: "true"

Argh4k commented 1 year ago

containerd logs from master:

May 12 08:43:21.379251 bootstrap-e2e-master containerd[650]: time="2023-05-12T08:43:21.379201176Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

on nodes we get cni config from template: NetworkPluginConfTemplate:/home/kubernetes/cni.template on the master it is empty. In logs from master I can see that setup-containerd is called from configure-helper and it should set the template path. My guess would be that https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L3181 is executed, but this should not be the case.

Argh4k commented 1 year ago

I have sshed on to the master and it looks like all configuration files regarding cni are in place. Kubectl describe node on master:

Events:
  Type    Reason                Age                 From             Message
  ----    ------                ----                ----             -------
  Normal  RegisteredNode        21m                 node-controller  Node bootstrap-e2e-master event: Registered Node bootstrap-e2e-master in Controller
  Normal  CIDRAssignmentFailed  26s (x56 over 21m)  cidrAllocator    Node bootstrap-e2e-master status is now: CIDRAssignmentFailed

Argh4k commented 1 year ago

Kube controller manager logs:

E0512 13:12:32.119653      11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"]
E0512 13:12:32.119671      11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
E0512 13:12:32.119682      11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master"
I0512 13:12:32.119755      11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"

Argh4k commented 1 year ago

Wojtek's gut feeling was right.

@p0lyn0mial if you want to we can create pr to add:

- --env=KUBE_GCE_PRIVATE_CLUSTER=false

to the tests and they should work just fine. In the meantime I will try to understand why KUBE_GCE_PRIVATE_CLUSTER makes master node to get two CIDRs.

BenTheElder commented 1 year ago

Does it have cloud NAT enabled?

If not the private network may be having issues fetching eg from registry.k8s.io which isn't a first-party GCP service unlike GCR

BenTheElder commented 1 year ago

cc @aojea re: GCE cidr allocation :-)

aojea commented 1 year ago

E0512 13:12:32.119671 11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"

https://github.com/kubernetes/test-infra/issues/29500#issuecomment-1545732863 @basantsa1989 we have a bug in the allocator https://github.com/kubernetes/kubernetes/commit/a013c6a2db54c59b78de974b181586723e088246

If we receive multiple cidrs before patching for dual-stack we should validate that those are dual stack

We have to fix it in k/k and in the cloud-provider-gcp https://github.com/kubernetes/cloud-provider-gcp/blob/67d1fd9f7255629fac3adfc956d0c8b2ac5f50f0/pkg/controller/nodeipam/ipam/cloud_cidr_allocator.go#L341-L344

Argh4k commented 1 year ago

FYI: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/util.sh#L3008 this is the place where we add master internal ip as a second alias if we are using KUBE_GCE_PRIVATE_CLUSTER

Then this second ip is picked by kcm (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/gce/gce_instances.go#L496) and allocator thinks we have dual stack and tries to apply both of them which fails, because we can have at most one ipv4 cidr per node.

aojea commented 1 year ago

Kube controller manager logs:

E0512 13:12:32.119653      11 cloud_cidr_allocator.go:315] "Failed to update the node PodCIDR after multiple attempts" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" node="bootstrap-e2e-master" cidrStrings=["10.64.0.0/24","10.40.0.2/32"]
E0512 13:12:32.119671      11 cloud_cidr_allocator.go:178] "Error updating CIDR" err="failed to patch node CIDR: Node \"bootstrap-e2e-master\" is invalid: spec.podCIDRs: Invalid value: []string{\"10.64.0.0/24\", \"10.40.0.2/32\"}: may specify no more than one CIDR for each IP family" workItem="bootstrap-e2e-master"
E0512 13:12:32.119682      11 cloud_cidr_allocator.go:187] "Exceeded retry count, dropping from queue" workItem="bootstrap-e2e-master"
I0512 13:12:32.119755      11 event.go:307] "Event occurred" object="bootstrap-e2e-master" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="CIDRAssignmentFailed" message="Node bootstrap-e2e-master status is now: CIDRAssignmentFailed"

@Argh4k do you have the entire logs?

Argh4k commented 1 year ago

@aojea https://gcsweb.k8s.io/gcs/sig-scalability-logs/ci-kubernetes-e2e-gci-gce-scalability-watch-list-off/1658029086385115136/bootstrap-e2e-master/ has all the logs from the master

wojtek-t commented 1 year ago

/sig network

aojea commented 1 year ago

based on @basantsa1989 comment https://github.com/kubernetes/kubernetes/pull/118043#issuecomment-1553661135 the allocator is working as expected and the problem is that this is not supported

https://github.com/kubernetes/kubernetes/blob/8db4d63245a89a78d76ff5916c37439805b11e5f/cluster/gce/util.sh#L3008

can we configure the cluster in a different way we don't pass two cidrs?

Argh4k commented 1 year ago

I hope we can, unfortunately I haven't had much time to look into this and other work was unblocked by running tests in a small public cluster.

p0lyn0mial commented 1 year ago

@Argh4k Hey, a friendly remainder to work on this issue :)

It looks like having a private cluster would increase egress traffic. Having a higher egress bandwidth would allow us to generate a larger test traffic. Currently, we had to reduce the test traffic because it seems that latency is being throttled due to the limited egress bandwidth.

See https://github.com/kubernetes/perf-tests/issues/2287

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

p0lyn0mial commented 9 months ago

I think that this issue still hasn't been resolved

/remove-lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

p0lyn0mial commented 6 months ago

I think that this issue still hasn't been resolved

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wojtek-t commented 3 months ago

/remove-lifecycle stale

BenTheElder commented 3 months ago

@aojea thoughts on this?

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes / test-infra

ControlPlane node is not ready in scalability tests when run on GCE #29500