improbable-eng / etcd-cluster-operator

A controller to deploy and manage etcd clusters inside of Kubernetes
MIT License
128 stars 35 forks source link

Not able to install the operator and provision an instance by following the documentation #191

Open HariNarayananMohan opened 4 years ago

HariNarayananMohan commented 4 years ago

Versions of relevant software used Openshift 4.3 which uses Kubenetes 1.16

What happened I'm trying to get the etcd-cluster-operator run on my cluster and provision EtcdClusters. Followed these Installing instructions and Contributing documentation separately, I was not able to successfully create the EtcdCluster in both ways.

What you expected to happen Install etcd-cluster-operator in my cluster and provision etcdclusters.

How to reproduce it (as minimally and precisely as possible):

Install cert manager

Option 1: Follow Installing Instructions

Step 1: Clone this github repo to your local. Step 2: From the root directory of the repo - cd config/default Step 3: export ECO_VERSION=v0.2.0 Step 4: kustomize edit set image controller=$ECO_VERSION Step 5: kustomize edit set image proxy=$ECO_VERSION Step 6: kubectl apply --kustomize .

Output

error: rawResources failed to read Resources: Load from path ../crd failed: '../crd' must be a file (got d='/<path to repo>/etcd-cluster-operator/config/crd')

Option 2: Follow Contributing Instructions

Step 1: export DOCKER_REPO=\<registryname> Step 2: make docker-build Step 3: make docker-push Step 4: make deploy

Current status:

oc get crd | grep improbable 
etcdbackups.etcd.improbable.io                              2020-06-18T17:34:59Z
etcdbackupschedules.etcd.improbable.io                      2020-06-18T17:35:00Z
etcdclusters.etcd.improbable.io                             2020-06-18T17:35:00Z
etcdpeers.etcd.improbable.io                                2020-06-18T17:35:00Z
etcdrestores.etcd.improbable.io                             2020-06-18T17:35:01Z
oc get pods 
NAME                                      READY   STATUS             RESTARTS   AGE
eco-controller-manager-796f74db94-jpfp7   1/1     Running            0          119m
eco-proxy-cfdb688bb-pb5rh                 0/1     CrashLoopBackOff   48         119m
oc logs eco-proxy-cfdb688bb-pb5rh -n eco-system 
2020-06-18T20:18:21.298Z    INFO    setup   Starting proxy  {"version": "v0.2.0-23-gf84abc6"}
2020-06-18T20:18:21.299Z    INFO    setup   Listening   {"grpc-address": ":8080"}

Step 5: kubectl apply -f config/samples/etcd_v1alpha1_etcdcluster.yaml

I tried creating this CR in both hari namespace and eco-system namespace

 oc get etcdclusters.etcd.improbable.io 
NAME         AGE
my-cluster   149m
 oc logs etcdclusters.etcd.improbable.io/my-cluster
error: no kind "EtcdCluster" is registered for version "etcd.improbable.io/v1alpha1" in scheme "k8s.io/kubernetes/pkg/kubectl/scheme/scheme.go:28"

Output Full logs to relevant components eco-controller-manager.log

Anything else we need to know

cheahjs commented 4 years ago

Thanks for the report.

Will investigate if the installation instructions are still up to date. Note to self: CI uses standalone kustomize v3, kubectl ships with kustomize v2. make deploy uses kustomize build <config> | kubectl apply -f.

With the cluster deployment error, the manager is logging

2020-06-18T17:54:30.724Z    ERROR   controller-runtime.controller   Reconciler error    {"controller": "etcdcluster", "request": "hari/my-cluster", "error": "Failed to reconcile: unable to create service: services \"my-cluster\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

A quick search suggests that OpenShift requires an additional RBAC permission for <resource>/finalizer to set finalizers. Our tests and deployments are currently on Kind/GKE, so would have missed this. Will investigate what is needed for a fix.

https://github.com/jaegertracing/jaeger-operator/issues/461 https://github.com/spotahome/redis-operator/issues/98

HariNarayananMohan commented 4 years ago

Thanks for the report.

Will investigate if the installation instructions are still up to date. Note to self: CI uses standalone kustomize v3, kubectl ships with kustomize v2. make deploy uses kustomize build <config> | kubectl apply -f.

With the cluster deployment error, the manager is logging

2020-06-18T17:54:30.724Z  ERROR   controller-runtime.controller   Reconciler error    {"controller": "etcdcluster", "request": "hari/my-cluster", "error": "Failed to reconcile: unable to create service: services \"my-cluster\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

A quick search suggests that OpenShift requires an additional RBAC permission for <resource>/finalizer to set finalizers. Our tests and deployments are currently on Kind/GKE, so would have missed this. Will investigate what is needed for a fix.

jaegertracing/jaeger-operator#461 spotahome/redis-operator#98

Thank you! I followed it and was able to make it work few days before. Thought of sharing it back.

jimmy-scott commented 2 years ago

Thank you! I followed it and was able to make it work few days before. Thought of sharing it back.

Hi! Could you share how? :) It's unclear which resource needs this exactly.

In the meantime I've read the sources to see which ownerReferences were created.. and found this works:

- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdbackupschedules/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdclusters/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdpeers/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdrestores/finalizers
  verbs:
  - update