Closed jinsongo closed 2 years ago
@Fei-Guo @christopherhein @gyliu513 @vincent-pli
We should sync the kubernetes ep from virtual cluster to super cluster. Based on the dws code for ep, we only filter ep if the service has selector. The kubernetes service does not have selector so the ep should be synced.
vService := &v1.Service{}
err := c.MultiClusterController.Get(request.ClusterName, request.Namespace, request.Name, vService)
if err != nil && !errors.IsNotFound(err) {
return reconciler.Result{Requeue: true}, fmt.Errorf("fail to query service from tenant master %s", request.ClusterName)
}
if err == nil {
if vService.Spec.Selector != nil {
// Supermaster ep controller handles the service ep lifecycle, quit.
return reconciler.Result{}, nil
}
}
Can you double check why the ep is not synced from virtualcluster to super cluster?
FYI, this is what I see from my local setup
kubectl get ep -n tenant1admin-f7ea3a-vc-sample-1-default
NAME ENDPOINTS AGE
kubernetes 172.17.0.7:6443 54d
@wangjsty
Upper comment is right, could you check the log of syncer
see if there is some exception there.
To @Fei-Guo , seems we change the /etc/hosts
in the container to let https://kubernetes:443
to point to the tenant's SVC "kubernetes" in super cluster, how do we do this? thanks.
@Fei-Guo, No endpoint created, that's why I manually created one as workaround.
# kubectl get ep -n default-3fbd77-vc-sample-1-default
No resources found in default-3fbd77-vc-sample-1-default namespace
@wangjsty You can try to update the EP in vc, e.g., adding a dummy label and see if it is created in super and by the mean time, check the syncer log for any errors.
@Fei-Guo @vincent-pli I reproduced the problem again, and here are some logs from syncer:
E0709 04:13:10.212959 1 dws.go:77] failed reconcile endpoints default/kubernetes CREATE of cluster default-3fbd77-vc-sample-1 endpoints "kubernetes" is forbidden: endpoint address 10.254.28.64 is not allowed
E0709 04:13:10.212998 1 mccontroller.go:445] endpoints-mccontroller dws request is rejected: endpoints "kubernetes" is forbidden: endpoint address 10.254.28.64 is not allowed
E0709 04:27:06.132262 1 dws.go:66] failed reconcile serviceaccount olm/default CREATE of cluster default-3fbd77-vc-sample-1 pServiceAccount default-3fbd77-vc-sample-1-olm/ exists but its delegated UID is different
E0709 04:27:06.132309 1 mccontroller.go:461] olm/default dws request reconcile failed: pServiceAccount default-3fbd77-vc-sample-1-olm/ exists but its delegated UID is different
E0709 04:27:06.134087 1 dws.go:66] failed reconcile serviceaccount operators/default CREATE of cluster default-3fbd77-vc-sample-1 pServiceAccount default-3fbd77-vc-sample-1-operators/ exists but its delegated UID is different
E0709 04:27:06.134104 1 mccontroller.go:461] operators/default dws request reconcile failed: pServiceAccount default-3fbd77-vc-sample-1-operators/ exists but its delegated UID is different
I0709 04:27:06.136775 1 mutate.go:306] vc default-3fbd77-vc-sample-1 does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0709 04:27:06.140429 1 mutate.go:306] vc default-3fbd77-vc-sample-1 does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
E0709 04:27:06.231048 1 dws.go:104] failed reconcile Pod olm/olm-operator-8d9bf86c9-k46gw UPDATE of cluster default-3fbd77-vc-sample-1 Operation cannot be fulfilled on pods "olm-operator-8d9bf86c9-k46gw": the object has been modified; please apply your changes to the latest version and try again
E0709 04:27:06.231100 1 mccontroller.go:461] olm/olm-operator-8d9bf86c9-k46gw dws request reconcile failed: Operation cannot be fulfilled on pods "olm-operator-8d9bf86c9-k46gw": the object has been modified; please apply your changes to the latest version and try again
I0709 04:27:12.999392 1 mutate.go:306] vc default-3fbd77-vc-sample-1 does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0709 04:27:13.881893 1 mutate.go:306] vc default-3fbd77-vc-sample-1 does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
I0709 04:27:13.895515 1 mutate.go:306] vc default-3fbd77-vc-sample-1 does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
E0709 04:27:13.949619 1 dws.go:104] failed reconcile Pod olm/packageserver-59f5468bcd-cvb2f UPDATE of cluster default-3fbd77-vc-sample-1 Operation cannot be fulfilled on pods "packageserver-59f5468bcd-cvb2f": the object has been modified; please apply your changes to the latest version and try again
E0709 04:27:13.949646 1 mccontroller.go:461] olm/packageserver-59f5468bcd-cvb2f dws request reconcile failed: Operation cannot be fulfilled on pods "packageserver-59f5468bcd-cvb2f": the object has been modified; please apply your changes to the latest version and try again
E0709 04:27:13.962594 1 dws.go:104] failed reconcile Pod olm/packageserver-59f5468bcd-xmwhc UPDATE of cluster default-3fbd77-vc-sample-1 Operation cannot be fulfilled on pods "packageserver-59f5468bcd-xmwhc": the object has been modified; please apply your changes to the latest version and try again
E0709 04:27:13.962623 1 mccontroller.go:461] olm/packageserver-59f5468bcd-xmwhc dws request reconcile failed: Operation cannot be fulfilled on pods "packageserver-59f5468bcd-xmwhc": the object has been modified; please apply your changes to the latest version and try again
E0709 04:37:14.077736 1 dws.go:83] failed reconcile endpoints olm/packageserver-service DELETE of cluster default-3fbd77-vc-sample-1 To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
E0709 04:37:14.077781 1 mccontroller.go:461] olm/packageserver-service dws request reconcile failed: To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
E0709 04:37:14.083498 1 dws.go:83] failed reconcile endpoints olm/packageserver-service DELETE of cluster default-3fbd77-vc-sample-1 To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
E0709 04:37:14.083535 1 mccontroller.go:461] olm/packageserver-service dws request reconcile failed: To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
E0709 04:37:14.094135 1 dws.go:83] failed reconcile endpoints olm/packageserver-service DELETE of cluster default-3fbd77-vc-sample-1 To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
E0709 04:37:14.094187 1 mccontroller.go:461] olm/packageserver-service dws request reconcile failed: To be deleted pEndpoints default-3fbd77-vc-sample-1-olm/packageserver-service delegated UID is different from deleted object.
@vincent-pli @Fei-Guo
I will use oc adm policy add-scc-to-user privileged -z vc-syncer -n vc-manager
as workaround, then try again. currently, I'm just use "anyuid" here, that could not be enough for vc-syncer
@vincent-pli @Fei-Guo
I tried, but oc adm policy add-scc-to-user privileged -z vc-syncer -n vc-manager
could not help.
Seems it's not problem of permission, OCP set some restriction when user try to create ep
manually:
https://github.com/openshift/kubernetes/blob/90622d8244d0124fe2d44c336e68d4a4f03da1b6/openshift-kube-apiserver/admission/network/restrictedendpoints/endpoint_admission.go#L113-L128
and this:
cluster-config-v1 configmap in kube-system namespace
The observed configmap install-config is decoded and the networking.podCIDR and networking.serviceCIDR is extracted and used as input for admissionPluginConfig.openshift.io/RestrictedEndpointsAdmission.configuration.restrictedCIDRs and servicesSubnet
https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/README.md
@wangjsty
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
What steps did you take and what happened: [A clear and concise description on how to REPRODUCE the bug.]
Reference https://github.com/kubernetes-sigs/cluster-api-provider-nested/blob/main/virtualcluster/doc/demo.md to install virtual cluster on OpenShift 4.7.13, but using the following workaround for some security constraints problems:
Follows as example:
Workarounds:
In order to activate the workaround, the related pods need be deleted to restart.
Install OLM on virtual cluster
Check OLM pods that are CrashLoopBackOff status
BTW, I also deployed nginx on the virtual cluster, looks like it does work well.
What did you expect to happen: OLM pods can be running normally.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
On virtual cluster:
On super cluster:
I checked, the endpoint 10.254.28.88:6443 is reachable. But https://kubernetes:443 is resolved as the kubernetes service IP 172.30.69.161 in super cluster, not the kubernetes service IP 10.32.0.1 in virtual cluster. And Endpoints is
<none>
for the kubernetes service from super cluster. Furthermore, I also debugged by "telnet 10.32.0.1 443" in the pod on virtual cluster, that could not be forwarded to endpoints because no connection can be established.Use the following workaround, the https://kubernetes:443 timeout probelm can be resolved !
Environment:
kubectl version
): 1.20.0/etc/os-release
): Red Hat Enterprise Linux 8/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-nested/labels?q=area for the list of labels]