googlearchive / k8s-service-catalog

[DEPRECATED] Commandline tool to manage Service Catalog lifecycle and GCP Service Broker atop Kubernetes Cluster
Apache License 2.0
69 stars 31 forks source link

controller-manager is NOT AVAILABLE #221

Open SpringMT opened 5 years ago

SpringMT commented 5 years ago

When sc install is executed, controller-manager is not AVAILABLE. The controller-manager pod is CrashLoopBackOff. Error message is below.

"error running controllers: failed to get api versions from server: failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: an error on the server ("service unavailable") has prevented the request from succeeding"   

I'm using GKE and execute the tutorial for installing service catalog. https://cloud.google.com/kubernetes-engine/docs/how-to/add-on/service-catalog/install-service-catalog

sc version

% sc version
sc version 0.1.1 darwin/amd64

kubernete version

% kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-gke.4", GitCommit:"0c81dc1e8c26fa2c47e50072dc7f98923cb2109c", GitTreeState:"clean", BuildDate:"2018-12-07T00:22:06Z", GoVersion:"go1.10.3b4", Compiler:"gc", Platform:"linux/amd64"}
jo2y commented 5 years ago

I'm seeing the same problem. To debug, created a new cluster, but did not enable VPC-native this time. This time the controller-manager worked. So there is some sort of permission or connectivity issue between the controller-manager and the kubernetes master. I'll keep poking at my non-working cluster to see if I can find something more.

SpringMT commented 5 years ago

My cluster that failed sc install enabled VPC-native too.

seans3 commented 5 years ago

Hello,

This is a known issue with the resource quota controller. According to Walter Fender (wfender@):

The latest bug here is that the RecourceQuotaController is returning an error when it gets a 5XX results from a discovery request. The error returned from the controller is (correctly) not handled by the Controller Manager. The fix here is to make the ResourceQuotaContoller be resilient to a 5XX result from discovery.

Both the 1.10 cherry pick ( https://github.com/kubernetes/kubernetes/pull/67155 https://www.google.com/url?q=https://github.com/kubernetes/kubernetes/pull/67155&sa=D&usg=AFQjCNF2XQc1IvEENvVibMsvxISlG2PDbA) and the 1.11 cherry pick ( https://github.com/kubernetes/kubernetes/pull/67154 https://www.google.com/url?q=https://github.com/kubernetes/kubernetes/pull/67154&sa=D&usg=AFQjCNFR3KNWPPIMeLkJ2P847Ci3FwxovA) have merged.

So in order to fix this issue, update your cluster to 1.10.8 or 1.11.3. These patches were released months ago.

Sean

On Sat, Jan 5, 2019 at 8:36 PM Spring_MT notifications@github.com wrote:

My cluster that failed sc install enabled VPC-native too.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/k8s-service-catalog/issues/221#issuecomment-451714746, or mute the thread https://github.com/notifications/unsubscribe-auth/AHC4GsaGsLr9HXKcF6JeAUzIDgJKz83rks5vAX1RgaJpZM4Zgzxi .

SpringMT commented 5 years ago

I'm using 1.11.5-gke.4, however the issue happened 😢 .

seans3 commented 5 years ago

Got it. Looking into this now. Would it be possible to get a hash of the cluster this is failing on, so we can debug it?

Thanks,

Sean

On Mon, Jan 7, 2019 at 2:50 AM Spring_MT notifications@github.com wrote:

I'm using 1.11.5-gke.4, however the issue happened 😢 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/k8s-service-catalog/issues/221#issuecomment-451896313, or mute the thread https://github.com/notifications/unsubscribe-auth/AHC4GllWwTtlp8pvH8yB7ocMuBsoAd3Iks5vAyZrgaJpZM4Zgzxi .

jo2y commented 5 years ago

I was running 1.11.5-gke.5 (I've since rebuilt the cluster to turn off vpc). While trying to debug, I noticed that kubectl api-resources returns the same error. So I think the problem is not directly with the controller-manager, but with the connection from the master to the api-server.

seans3 commented 5 years ago

@kibbles-n-bytes @martinmaly Do we know what version of the controller manager the service catalog is now using? Do we know which image version of the controller manager the "sc" tool is installing? It looks like the service catalog controller manager image installed by "sc" might not have the following bug fixes:

66932: Include unavailable API services in discovery response

67433: allow failed discovery on initial quota controller start

What do you guys think?

jo2y commented 5 years ago

The version sc is installing is gcr.io/gcp-services/service-catalog:v0.1.11-gke.0. Configured here. That is the only one available and was built Mar 27, 2018. So definitely predates the fixes.

The crashloop does appear to be fixed by using a newer version. Steps I took to test:

  1. Cloned my existing cluster, but enabled VPC.
  2. Install with 'sc install' and 'sc add-gcp-broker'.
  3. Verified controller-manager was failing. This is the state of the cluster I gave to @seans3 this morning to debug.
  4. Change the image: to quay.io/kubernetes-service-catalog/service-catalog:latest for both apiserver and controller-manager.
  5. Remove --admission-control "KubernetesNamespaceLifecycle" from the apiserver args. (I'm not convinced this shouldn't have been replaced with something else).
  6. At this point, apiserver was complaining about permissions. So I used this helm chart to create an rbac configuration.
  7. The errors in the logs of both jobs stopped after a few minutes.
  8. I tested by creating a service-account with the yaml below.
  9. svcat get instances and svcat get binding report 'Ready' and the secret containing a service account key was created.
apiVersion: v1
kind: List
items:
- apiVersion: servicecatalog.k8s.io/v1beta1
  kind: ServiceBinding
  metadata:
    name: vpc-test-sa-iam-sa-binding
    namespace: default
  spec:
    instanceRef: {name: vpc-test-sa-iam-sa}
    secretName: vpc-test-sa-credentials
- apiVersion: servicecatalog.k8s.io/v1beta1
  kind: ServiceInstance
  metadata:
    name: vpc-test-sa-iam-sa
    namespace: default
  spec:
    clusterServiceClassExternalName: cloud-iam-service-account
    clusterServicePlanExternalName: beta
    parameters: {accountId: vpc-test-sa, displayName: SA for testing the VPC enabled cluster}
TaylorMutch commented 5 years ago

I have what the OP issue describes occurring with a fresh install in 1.12.5-gke.10, sc version 0.1.1 darwin/amd64.

TaylorMutch commented 5 years ago

Rebuilt with a different cluster version, 1.11.6-gke.2, and has same result as OP.

TaylorMutch commented 5 years ago

Okay I created a different cluster (NOT VPC-native), and it succeeds. Seems like I have the issue as described above, I will try using a newer version of the service catalog as @jo2y suggests.

bonfante commented 5 years ago

Same problem with VPC cluster

fruwe commented 4 years ago

I managed to make it work by adding a clusterrole and changing the image to the one mentioned by @jo2y (thank you)

  kubectl create clusterrolebinding cluster-admin-controller-manager-binding --clusterrole=cluster-admin --user=system:serviceaccount:service-catalog:controller-manager

  kubectl set image deployment/controller-manager controller-manager=quay.io/kubernetes-service-catalog/service-catalog:latest -n service-catalog

however, after that sc add-gcp-broker fails with

Failed to configure the Service Broker
Error: error deploying the Service Broker configs: deploy failed with output: exit status 1: error: unable to recognize "/tmp/service-catalog-gcp562627680/gcp-broker.yaml": no matches for kind "ClusterServiceBroker" in version "servicecatalog.k8s.io/v1beta1"