karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.11k stars 805 forks source link

Join cluster failed(The logic to determine cluster health may not quite right ) #1673

Closed duanmengkk closed 1 year ago

duanmengkk commented 2 years ago

What happened: The ready status of cluster is false,and the output of describe is below 图片

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

  1. step1: apply a apiservice in member
# apiservice.yaml
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1alpha1.cluster.karmada.io
  labels:
    app: karmada-aggregated-apiserver
    apiserver: "true"
spec:
  insecureSkipTLSVerify: true
  group: cluster.karmada.io
  groupPriorityMinimum: 2000
  service:
    name: karmada-aggregated-apiserver
    namespace: karmada-system
  version: v1alpha1
  versionPriority: 10

2.step2: join the member cluster

kubectl karmada join member47 --kubeconfig=/root/.kube/config --karmada-context=karmada-apiserver --cluster-kubeconfig=/root/.kube/config --cluster-context=cluster47

Anything else we need to know?:

when I excute the command kubectl api-resources,the output is 图片 But I don't think this should lead to an unhealthy state of the cluster. Other functions are normal, do not affect the use

Code location at pkg/controllers/status/cluster_status_controller.go of line 171

    apiEnables, err := getAPIEnablements(clusterClient)
    if err != nil {
        return c.setStatusCollectionFailedCondition(cluster, currentClusterStatus, fmt.Sprintf("failed to get the list of APIs installed in the member cluster: %v", err))
    }

Maybe we should ignore the err? I am not sure,asking you for response.

lonelyCZ commented 2 years ago

I think this is due to member cluster misuse of AA, and it does need to be addressed, but it doesn't feel right to just ignore it.

RainbowMango commented 2 years ago

step1: apply a apiservice in member

I don't think the API service should be deployed in the member cluster. It is intended to karmada-apiserver.

lonelyCZ commented 2 years ago

I don't think the API service should be deployed in the member cluster. It is intended to karmada-apiserver.

There seems a question that whether it is possible for member cluster to extend APIserver with AA.

duanmengkk commented 1 year ago

step1: apply a apiservice in member

I don't think the API service should be deployed in the member cluster. It is intended to karmada-apiserver.

Just use this apiservice as a example,It caused by a misuse of member cluster. Is it too strict to determines whether the cluster is healthy?In fact, member clusters are still available and can be scheduled。The comment line of this method in client-go is image while other checks of cluster status such as listPods and listNodes are different from getAPIEnablements, if err is not nil , member clusters are indeed unhealthy. @RainbowMango

duanmengkk commented 1 year ago

However, if we ignore this error, it can lead to unexpected results probably. @lonelyCZ

karmada-bot commented 1 year ago

@duanmengkk: Closing this issue.

In response to [this](https://github.com/karmada-io/karmada/issues/1673#issuecomment-1114120239): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duanmengkk commented 1 year ago

/reopen

karmada-bot commented 1 year ago

@duanmengkk: Reopened this issue.

In response to [this](https://github.com/karmada-io/karmada/issues/1673#issuecomment-1114120967): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
lonelyCZ commented 1 year ago

However, if we ignore this error, it can lead to unexpected results probably.

Yes, it is caused by misconfiguration in member cluster, we should report this error to cluster administrator who decide to solve this error.

duanmengkk commented 1 year ago

/assign

duanmengkk commented 1 year ago

What's your opinion? Whether a new state should be introduced like WARN when get clusters by kubectl get cluster? @lonelyCZ

lonelyCZ commented 1 year ago

I am not quite clear, if temporarily ignore this error and directly join the member cluster, what consequences will result.

Whether a new state should be introduced like WARN when get clusters by kubectl get cluster?

If it does not affect normal use, take this as a WARN and may be feasible.

XiShanYongYe-Chang commented 1 year ago

Perhaps a new status condition should be added to indicate that the current cluster has not successfully collected server APIs so that users can easily obtain this information.

duanmengkk commented 1 year ago

Perhaps a new status condition should be added to indicate that the current cluster has not successfully collected server APIs so that users can easily obtain this information.

I used to think about it this way,but i research the code in k8s, they just log the error and do not return it,because the result and error can return at the same time. see https://github.com/kubernetes/kubernetes/blob/c84d0864ddebbd0d36ce295cf74a447f24d7b3ec/staging/src/k8s.io/client-go/restmapper/shortcut.go#L88 and https://github.com/kubernetes/kubernetes/blob/c84d0864ddebbd0d36ce295cf74a447f24d7b3ec/staging/src/k8s.io/cloud-provider/app/controllermanager.go#L463

Should we add a status of WARN ?@XiShanYongYe-Chang

XiShanYongYe-Chang commented 1 year ago

I was worried that a propagated error might not be easy to locate.

Should we add a status of WARN ?

The APIs collected by the cluster are set in .status.apiEnablements, users can also locate the error based on logs. It is feasible.

duanmengkk commented 1 year ago

sorry,i do not understand you very clearly ,you mean whe add a new status in .status.conditions.status and store the log in .status.conditions.message? @XiShanYongYe-Chang image

XiShanYongYe-Chang commented 1 year ago

@duanmengkk Sorry, I didn't explain clearly. That's what I thought.

duanmengkk commented 1 year ago

@duanmengkk Sorry, I didn't explain clearly. That's what I thought.

ok,i will modify the PR according to your suggestion

XiShanYongYe-Chang commented 1 year ago

ok,i will modify the PR according to your suggestion

It's my suggestion, we can ask @lonelyCZ @RainbowMango for suggestions.

RainbowMango commented 1 year ago

// The returned group and resource lists might be non-nil with partial results even in the // case of non-nil error. ServerGroupsAndResources() ([]metav1.APIGroup, []metav1.APIResourceList, error)

I agree to add a new condition for it. And at the meantime, I'm thinking if we can get the API list as much as we can?

Without the API enablement, the resource will be prevented from the cluster by karmda-scheduler, because there is a plugin to check the API enablements.

duanmengkk commented 1 year ago

I'm thinking if we can get the API list as much as we can?

I don't think that's a conflict,if a error occurs,it means k8s server can not hanlder the gvr,so the APIinstalled plugin should ignore the gvr too. Let me reproduce the scenario,if a APIService is

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1alpha1.test.a.a
spec:
  group: test.a.a
  groupPriorityMinimum: 2000
  service:
    name: test
    namespace: test
  version: v1alpha1
  versionPriority: 10

and I apply it.Then kubectl api-resources will return the API enablements and a error at the same time. image This does no matter affect to k8s server,but it makes member cluster not ready and unhealthy now.

duanmengkk commented 1 year ago

/close

karmada-bot commented 1 year ago

@duanmengkk: Closing this issue.

In response to [this](https://github.com/karmada-io/karmada/issues/1673#issuecomment-1136600562): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.