kubernetes-retired / cluster-api-provider-nested

Cluster API Provider for Nested Clusters
Apache License 2.0
301 stars 67 forks source link

[VirtualCluster] Error creating: failed to list services from cluster xxxxx cache: service is not ready #325

Closed LuBingtan closed 1 year ago

LuBingtan commented 1 year ago

What steps did you take and what happened: I followed the vc demo doc here to create a virtual cluster.

But when I tried to create a pod in the vc. The pod is always pending. Try to describe the pod in vc

kubectl --kubeconfig vc-1.kubeconfig describe po test-deploy-5fbd8f7c8-mnzhx

output:

Name:             test-deploy-5fbd8f7c8-mnzhx
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=vc-test
                  pod-template-hash=5fbd8f7c8
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/test-deploy-5fbd8f7c8
Containers:
  poc:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      top
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qqgp2 (ro)
Volumes:
  kube-api-access-qqgp2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason        Age                From       Message
  ----     ------        ----               ----       -------
  Warning  FailedCreate  6s (x12 over 16s)  vc-syncer  Error creating: failed to list services from cluster default-25ca04-vc-sample-1 cache: service is not ready

What did you expect to happen: The pod should be running.

Anything else you would like to add:

It seems that the error occurs here: https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/resources/pod/dws.go#L334

Why do we need to check services in pPod.Namespace ? There is apparently no services in that namespace unless I manually create one, because that namespace is just created by the syncer.

Is this a bug or is there anything I missed?

Environment:

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-nested/labels?q=area for the list of labels]

wondywang commented 1 year ago

Hi, @LuBingtan. Which version of virtual cluster and root cluster are you running? And I did not get this part "There is apparently no services in that namespace". This doesn't make sense, it should have at least kubernetes.default.svc.

LuBingtan commented 1 year ago

Hi, I have retried and found the root cause might be that the kubernetes.default.svc failed to be synced. Error logs:

E1108 04:32:15.064617       1 dws.go:65] failed reconcile service default/kubernetes CREATE of cluster default-3a3ae6-vc-sample-1 Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified
E1108 04:32:15.064652       1 mccontroller.go:476] default/kubernetes dws request reconcile failed: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.32.0.1"}: must be empty when `clusterIP` is not specified

It looks like ClusterIPs should also be reset before creating. https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/4cc1422fd340d19c0d21b4f336dc664e45a12954/virtualcluster/pkg/syncer/conversion/mutate.go#L422-L431

@wondywang What do you think? If this thought sounds ok, I can help to fix.

And FYI

wondywang commented 1 year ago

thanks @LuBingtan

PTAL @Fei-Guo @christopherhein , It seems that it is indeed necessary to reset the ClusterIPs here. And we already do that internally.

Fei-Guo commented 1 year ago

Yes, looks like a bug. @wondywang can you fix it?

christopherhein commented 1 year ago

Agreed, seems like a great catch. thanks!

wondywang commented 1 year ago

Yes, looks like a bug. @wondywang can you fix it?

ok, i will