halkyonio / operator

Kubernetes Operator simplifying the development of microservices on k8s !
Apache License 2.0
40 stars 14 forks source link

Halkyon CircleCI tests are not working anymore #170

Closed cmoulliard closed 5 years ago

cmoulliard commented 5 years ago

Issue

Halkyon CircleCI tests are not working anymore, see https://circleci.com/gh/halkyonio/operator/2332

Remark: This is certainly an issue related to webhook. Try to change then apiserver.enableValidatingWebhook=false,apiserver.enableMutatingWebhook=false toapiserver.enableValidatingWebhook=true,apiserver.enableMutatingWebhook=true

geoand commented 5 years ago

I wonder if it could be a problem with a new KubeDB version....

cmoulliard commented 5 years ago

new KubeDB version....

Excepted if this syntax is wrong but otherwise, we still install 0.12.0 - https://github.com/halkyonio/operator/blob/master/.circleci/config.yml#L89-L90

geoand commented 5 years ago

That's great information, thanks

cmoulliard commented 5 years ago

This error is not reported using k8s 1.13 or k8s 1.14

 kind create cluster --name halkyon \
>   --config kind-config.yml \
>   --image kindest/node:v1.14.6@sha256:464a43f5cf6ad442f100b0ca881a3acae37af069d5f96849c1d06ced2870888d
Creating cluster "halkyon" ...
 ✓ Ensuring node image (kindest/node:v1.14.6) đŸ–ŧ
 ✓ Preparing nodes đŸ“Ļ 
 ✓ Creating kubeadm config 📜 
 ✓ Starting control-plane 🕹ī¸ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="halkyon")"
kubectl cluster-info

export KUBECONFIG="$(kind get kubeconfig-path --name="halkyon")"
kubectl cluster-info
Kubernetes master is running at https://192.168.99.50:49488
KubeDNS is running at https://192.168.99.50:49488/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

./kubedb.sh 
$HELM_HOME has been configured at /Users/dabou/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
tiller-deploy-66b7dd976-dmblq   1/1     Running   0          25s
clusterrolebinding.rbac.authorization.k8s.io/tiller-cluster-admin created
"appscode" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "appscode" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete.
NAME:   kubedb-operator
LAST DEPLOYED: Tue Oct 22 07:47:08 2019
NAMESPACE: kubedb
STATUS: DEPLOYED

RESOURCES:
==> v1/ClusterRole
NAME             AGE
kubedb-operator  1s

==> v1/ClusterRoleBinding
NAME                                      AGE
kubedb-operator                           1s
kubedb-operator-apiserver-auth-delegator  1s

==> v1/Deployment
NAME             READY  UP-TO-DATE  AVAILABLE  AGE
kubedb-operator  0/1    1           0          1s

==> v1/Pod(related)
NAME                              READY  STATUS             RESTARTS  AGE
kubedb-operator-6f957859c4-j5fm4  0/1    ContainerCreating  0         1s

==> v1/RoleBinding
NAME                                                              AGE
kubedb-operator-apiserver-extension-server-authentication-reader  1s

==> v1/Secret
NAME                            TYPE    DATA  AGE
kubedb-operator-apiserver-cert  Opaque  2     1s

==> v1/Service
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP  PORT(S)  AGE
kubedb-operator  ClusterIP  10.111.246.195  <none>       443/TCP  1s

==> v1/ServiceAccount
NAME             SECRETS  AGE
kubedb-operator  1        1s

==> v1beta1/APIService
NAME                            AGE
v1alpha1.mutators.kubedb.com    1s
v1alpha1.validators.kubedb.com  1s

==> v1beta1/PodSecurityPolicy
NAME             PRIV  CAPS                   SELINUX   RUNASUSER  FSGROUP   SUPGROUP  READONLYROOTFS  VOLUMES
kubedb-operator  true  IPC_LOCK,SYS_RESOURCE  RunAsAny  RunAsAny   RunAsAny  RunAsAny  false           *

NOTES:
To verify that KubeDB has started, run:

  kubectl --namespace=kubedb get deployments -l "release=kubedb-operator, app=kubedb"

Now install/upgrade appscode/kubedb-catalog chart.

To install, run:

  helm install appscode/kubedb-catalog --name kubedb-catalog --version 0.12.0 --namespace kubedb

To upgrade, run:

  helm upgrade kubedb-catalog appscode/kubedb-catalog --version 0.12.0 --namespace kubedb

Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "elasticsearchversions.catalog.kubedb.com" not found
...
"mongodbversions.catalog.kubedb.com" not found
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "mysqlversions.catalog.kubedb.com" not found
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "postgresversions.catalog.kubedb.com" not found
Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io "redisversions.catalog.kubedb.com" not found
NAME                                       CREATED AT
elasticsearchversions.catalog.kubedb.com   2019-10-22T05:47:38Z
memcachedversions.catalog.kubedb.com       2019-10-22T05:47:39Z
mongodbversions.catalog.kubedb.com         2019-10-22T05:47:39Z
mysqlversions.catalog.kubedb.com           2019-10-22T05:47:39Z
postgresversions.catalog.kubedb.com        2019-10-22T05:47:39Z
redisversions.catalog.kubedb.com           2019-10-22T05:47:39Z
NAME:   kubedb-catalog
LAST DEPLOYED: Tue Oct 22 07:47:40 2019
NAMESPACE: kubedb
STATUS: DEPLOYED

RESOURCES:
==> v1alpha1/PostgresVersion
NAME      AGE
10.2      0s
10.2-v1   0s
10.2-v2   0s
10.2-v3   0s
10.2-v4   0s
10.6      0s
10.6-v1   0s
10.6-v2   0s
11.1      0s
11.1-v1   0s
11.1-v2   0s
11.2      0s
9.6       0s
9.6-v1    0s
9.6-v2    0s
9.6-v3    0s
9.6-v4    0s
9.6.7     0s
9.6.7-v1  0s
9.6.7-v2  0s
9.6.7-v3  0s
9.6.7-v4  0s

==> v1beta1/PodSecurityPolicy
NAME                    PRIV   CAPS                   SELINUX   RUNASUSER  FSGROUP   SUPGROUP  READONLYROOTFS  VOLUMES
elasticsearch-db        true   IPC_LOCK,SYS_RESOURCE  RunAsAny  RunAsAny   RunAsAny  RunAsAny  false           *
elasticsearch-snapshot  false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
memcached-db            false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
mongodb-db              false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
mongodb-snapshot        false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
mysql-db                false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
mysql-snapshot          false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
postgres-db             false  IPC_LOCK,SYS_RESOURCE  RunAsAny  RunAsAny   RunAsAny  RunAsAny  false  *
postgres-snapshot       false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
redis-db                false  RunAsAny               RunAsAny  RunAsAny   RunAsAny  false     *
cmoulliard commented 5 years ago

As I can't reproduce the problem using k8s 1.13 or 1.14 (see script played hereafter), I propose to modify the circleci job to enable webhook. A change will be then needed for ocp !!!

#!/usr/bin/env bash

export DOCKER_TLS_VERIFY=
export DOCKER_HOST=tcp://192.168.99.50:2376

KUBEDB_VERSION=0.12.0
IMAGE="v1.13.10@sha256:2f5f882a6d0527a2284d29042f3a6a07402e1699d792d0d5a9b9a48ef155fa2a"
# IMAGE="v1.14.6@sha256:464a43f5cf6ad442f100b0ca881a3acae37af069d5f96849c1d06ced2870888d"

kind delete cluster --name halkyon
kind create cluster --name halkyon \
  --config kind-config.yml \
  --image kindest/node:${IMAGE}
export KUBECONFIG="$(kind get kubeconfig-path --name="halkyon")"
kubectl cluster-info

helm init
until kubectl get pods -n kube-system -l name=tiller | grep 1/1; do sleep 1; done
kubectl create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default

helm repo add appscode https://charts.appscode.com/stable/
helm repo update
helm install appscode/kubedb \
   --name kubedb-operator \
   --version ${KUBEDB_VERSION} \
   --namespace kubedb \
   --set apiserver.enableValidatingWebhook=false,apiserver.enableMutatingWebhook=false

TIMER=0
until kubectl get crd elasticsearchversions.catalog.kubedb.com memcachedversions.catalog.kubedb.com mongodbversions.catalog.kubedb.com mysqlversions.catalog.kubedb.com postgresversions.catalog.kubedb.com redisversions.catalog.kubedb.com || [[ ${TIMER} -eq 60 ]]; do
  sleep 10
  TIMER=$((TIMER + 1))
done

helm install appscode/kubedb-catalog \
  --name kubedb-catalog \
  --version ${KUBEDB_VERSION} \
  --namespace kubedb \
  --set catalog.postgres=true,catalog.elasticsearch=false,catalog.etcd=false,catalog.memcached=false,catalog.mongo=false,catalog.mysql=false,catalog.redis=false
cmoulliard commented 5 years ago

I see that the version of helm installed is now 2.15.0 while the last working job was v2.14.3. I dont really know if there is a causality link ....

cmoulliard commented 5 years ago

The problem reported here don't exist if we install the version 2.14.3 of helm (= see previous jobs that worked)

https://github.com/halkyonio/operator/blob/bb83f706445bc707563325c27ef5c6cf69267b9a/.circleci/config.yml#L64-L66

and not latest 2.15. I dont know why.

geoand commented 5 years ago

So then let's just keep the previous help version for now? Or would that create other problems?

geoand commented 5 years ago

It's probably a good idea to pin versions anyway so as to not have unexpected results like this (that I have seen happen in all sorts of projects when using "latest")

cmoulliard commented 5 years ago

Or would that create other problems?

I added a line to the circleci script to use the latest version of helm which was working with the latest jobs for the operator. I don't think that we will have more issues. BTW, that should be great to understand why using helm 2.14 vs 2.15 is causing such a problem when kubedb helm is installed for the catalog ...

geoand commented 5 years ago

Maybe we should wait for a patch version of helm and/or a newer version of kubedb which could perhaps fix these problems?

cmoulliard commented 5 years ago

Our problem is perhaps related to this issue reported to kubedb where they did some changes to wait till crd are registered : https://github.com/kubedb/installer/pull/22

cmoulliard commented 5 years ago

Ticket opened: https://github.com/kubedb/project/issues/671

cmoulliard commented 5 years ago

I propose that we close the ticket and that we track the kubedb ticket created if additional changes will be needed later for the circleci file. Make sense @metacosm ?

metacosm commented 5 years ago

Agreed.