jenkins-x / jx

Jenkins X provides automated CI+CD for Kubernetes with Preview Environments on Pull Requests using Cloud Native pipelines from Tekton
https://jenkins-x.io/
Apache License 2.0
4.57k stars 786 forks source link

jx git operator minikube install hangs on "waiting for vault pod vault-0 ..." #7622

Open cr-zlilaharon opened 3 years ago

cr-zlilaharon commented 3 years ago

Summary

Hey, I'm running a vanilla install for the minikube distribution, from the official documentation. Everything goes well until i run the jx admin operator command, which works fine until the part where it tries to install vault.

The last part of the log looks like so:

jx gitops split --dir /tmp/generate
jx gitops rename --dir /tmp/generate
jx gitops helmfile move --output-dir config-root --dir /tmp/generate --dir-includes-release-name
# convert k8s Secrets => ExternalSecret resources using secret mapping + schemas
# see: https://github.com/jenkins-x/jx-secret#mappings
jx secret convert --source-dir config-root -r jx-vault
not converting Secret webhook-certs in namespace tekton-pipelines to an ExternalSecret as it has no data
# replicate secrets to local staging/production namespaces
jx secret replicate --selector secret.jenkins-x.io/replica-source=true
replicated ExternalSecret jx-production/tekton-container-registry-auth to config-root/namespaces/jx-production/jxboot-helmfile-resources/tekton-container-registry-auth-secret.yaml
replicated ExternalSecret jx-staging/tekton-container-registry-auth to config-root/namespaces/jx-staging/jxboot-helmfile-resources/tekton-container-registry-auth-secret.yaml
ignoring backend type vault
# populate secrets from filesystem definitions
VAULT_ADDR=https://vault.jx-vault:8200 VAULT_NAMESPACE=jx-vault jx secret populate --source filesystem --secret-namespace jx-vault
WARNING: failed to read field metadata.name for path docs/releases.yaml
WARNING: failed to read field kind for path docs/releases.yaml
waiting for vault pod vault-0 in namespace jx-vault to be ready...

No 'jx-vault' namespace is created, and the installation hangs here indefinitely.

These are the namespaces / pods available at this point:

➜  ~ minikube kubectl -- get namespaces
NAME              STATUS   AGE
default           Active   68m
jx-git-operator   Active   6m45s
kube-node-lease   Active   68m
kube-public       Active   68m
kube-system       Active   68m
➜  ~ minikube kubectl -- get pods -A
NAMESPACE         NAME                                                 READY   STATUS      RESTARTS   AGE
jx-git-operator   jx-boot-ad457f4e-f5da-4560-870f-a1ca98924060-grjgj   1/1     Running     0          3m
jx-git-operator   jx-git-operator-6897847dd-479lg                      1/1     Running     0          3m24s
kube-system       coredns-f9fd979d6-vhd65                              1/1     Running     0          64m
kube-system       etcd-minikube                                        1/1     Running     0          65m
kube-system       ingress-nginx-admission-create-g787b                 0/1     Completed   0          64m
kube-system       ingress-nginx-admission-patch-9fgnn                  0/1     Completed   2          64m
kube-system       ingress-nginx-controller-558664778f-4r99d            1/1     Running     0          64m
kube-system       kube-apiserver-minikube                              1/1     Running     0          65m
kube-system       kube-controller-manager-minikube                     1/1     Running     0          65m
kube-system       kube-proxy-nc2nr                                     1/1     Running     0          64m
kube-system       kube-scheduler-minikube                              1/1     Running     0          65m
kube-system       storage-provisioner

Steps to reproduce the behavior

Run though the default minikube installation

Expected behavior

Vault namespace and pods should be created.

Actual behavior

Vault namespace and pods are not created.

Jx version

version: 3.1.251

Diagnostic information

"diagnose" Command not found

Kubernetes cluster

minikube version: v1.15.1

Kubectl version

The output of kubectl version --client is:

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}

Operating system / Environment

MacOS 10.15.7

jstrachan commented 3 years ago

apologies, we don't have any BDD tests on the minikube installation and I think it broke.

Any chance you could install from a clean new repository based on this please: https://github.com/jx3-gitops-repositories/jx3-minikube/generate we've removed vault from the install which hopefully fixes this

cr-zlilaharon commented 3 years ago

@jstrachan Thank you for the feedback. I ran a clean install from the template you provided, running the same steps from this guide

It appears the link you provided and the one in the guide are the same. I'm still getting the same result as described in the original post.

Let me know if there is anything i can modify to overcome this.

Thank you very much.

jstrachan commented 3 years ago

just to be 100% clear - your git repository you passed to the git operator has this line right? https://github.com/jx3-gitops-repositories/jx3-minikube/blob/master/jx-requirements.yml#L30

i.e. its not still using vault?

cr-zlilaharon commented 3 years ago

@jstrachan Yes, 100%. I'm up to date with commit https://github.com/jx3-gitops-repositories/jx3-minikube/commit/b1008d03673ec9903dce8cad17ddb1ce354d7893

ginosubscriptions commented 3 years ago

Hey guys, I just ran into this issue as well. I.e.: minikube install is not possible at this moment (https://github.com/jenkins-x/jx-cli/releases/download/v3.1.266/jx-cli-linux-amd64.tar.gz and https://github.com/jx3-gitops-repositories/jx3-minikube).

cr-zlilaharon commented 3 years ago

@jstrachan @ginosubscriptions Hey, updating with some more findings:

I managed to somehow bypass the vault creation by changing jx3-minikube/.jx/secret/mapping/secret-mappings.yaml It had many entries with backendType: vault I changed it to backendType: local-external-secrets but I'm really not sure what should go there.

At the moment there are a bunch of pods in the jx namespace that cannot be created due to missing secrets:

jx-git-operator    jx-boot-19d90010-c7e2-486b-8d69-7bcf896485ac-hc624   1/1     Running                      0          7m14
s
jx-git-operator    jx-git-operator-6897847dd-nb64r                      1/1     Running                      0          7m32
s
jx                 bucketrepo-bucketrepo-685978fbcd-kkjk7               0/1     ContainerCreating            0          5m53
s
jx                 docker-registry-6d9dc74c67-r6ht2                     0/1     CreateContainerConfigError   0          5m54
s
jx                 jx-build-controller-69f4b9fff8-gfb4g                 0/1     ContainerCreating            0          5m54
s
jx                 jx-pipelines-visualizer-68b679547c-54cq7             0/1     ContainerCreating            0          5m52
s
jx                 jx-preview-gc-jobs-1613914800-95vnx                  0/1     Error                        0          89s
jx                 jx-preview-gc-jobs-1613914800-dgdtm                  0/1     Error                        0          2m24
s
jx                 jx-preview-gc-jobs-1613914800-rcxgp                  0/1     Error                        0          109s
jx                 jx-preview-gc-jobs-1613914800-sxrbn                  0/1     Error                        0          119s
jx                 jx-preview-gc-jobs-1613914800-vbdpm                  0/1     Error                        0          49s
jx                 lighthouse-foghorn-66d8777bc-mpsx2                   0/1     CreateContainerConfigError   0          5m51
s
jx                 lighthouse-keeper-88c54fdc9-xtk9m                    0/1     CreateContainerConfigError   0          5m50
s
jx                 lighthouse-tekton-controller-9748dc647-hk77w         1/1     Running                      0          5m51
s
jx                 lighthouse-webhooks-7b66d5c675-rzvm7                 0/1     CreateContainerConfigError   0          5m50

I just want to point out that even before the change i made these pods failed to create due to missing secrets. So my assumption is that currently we do not have a working secret management on minikube deployment.

The jx admin operator logs looks like so now:

# lets wait for the ExternalSecrets service to populate the mandatory Secret resources
VAULT_ADDR=https://vault.jx-vault:8200 jx secret wait -n jx
waiting for the mandatory Secrets to be populated from ExternalSecrets...
bucketrepo-config: key  missing properties:
docker-registry-secret: key  missing properties:
jenkins-x-bucketrepo: key  missing properties: ,
jx-basic-auth-user-password: key  missing properties: ,
lighthouse-hmac-token: key  missing properties:
lighthouse-oauth-token: key  missing properties:
tekton-container-registry-auth: key  missing properties:
tekton-git: key  missing properties: ,

@jstrachan Perhaps you can point me to the local-external-secret installation part or elaborate on how that part works? Thank you very much.

ginosubscriptions commented 3 years ago

@cr-zlilaharon @jstrachan I did get a bit further following the procedure of replacing the backendType: vault with backendType: local-external-secrets AND capturing the secrets creation yaml files from the boot pod and running them manually (requires creating the jx namespace first). If you then execute the jx admin operator command, it indeed doesn't hang when requiring vault-0, nor does it fail getting the secrets. I now have a set of running jx pods (in the corresponding namespaces). I am missing the kuberhealthy, nginx and secret-infra namespaces and pods, but maybe that is acceptable/normal. Basically, everything looks like it's running.

As an alternative test, I also installed HashiCorp Vault (with Consul) to actually have a vault-0 running when launching jx admin operator. This too solves the problem of hanging on vault-0, but fails immediately after, because jx admin operator still fails to connect to the vault (i.e.: vault-0 is ready, but the installation cannot continue as it cannot create/get secrets as it cannot connect). I'm not sure it the vault-0 pod can be individually created (as I did) in minikube and if there are some steps to "make it compatible" with the jx setup procedure, or if indeed the jx setup does create its own. I'll be investigating this further the coming week...

cr-zlilaharon commented 3 years ago

@ginosubscriptions Can you specify what yaml files you apply for secret creation, and how? Thank you.

ginosubscriptions commented 3 years ago

@ginosubscriptions Can you specify what yaml files you apply for secret creation, and how? Thank you.

Hi @cr-zlilaharon, I recuperate them from the jx-boot-[generated ID] pod running under the jx-git-operator namespace. Unfortunately, since this pod is now completed, I cannot exec into it anymore to tell you exactly where they were located (I think it was under /tmp, but I'm not 100% certain). I did for my own convenience create a (all secrets) YAML file so that I can run the secret creation before running the jx admin operator command.

I.e.: this is the procedure I would follow:

ginosubscriptions commented 3 years ago

@jstrachan, as it turns out, I get exactly the same problem when using the GKE procedure. I took the time to go over the excellent video series on JX3 (https://www.youtube.com/watch?v=kDCNDAyqwpo&list=PLr_PmC4W69dKM3fo8OK729fdmX_MTqdHd), which worked like a charm, until I substituted the Google Secrets Manager configuration by the HashiCorp Vault configuration (read: use URL https://github.com/ginosubscriptions-org001/jx3-cluster-vault-gke as basis for cluster resources generation with Terraform). Terraform apply hangs (using the jx admin log command to monitor cluster generation progress) at exactly the same "waiting for vault" error as the minikube setup.

Since I'm assuming the GKE context might be of higher priority, I thought this good for you to know.

Cheers.

jstrachan commented 3 years ago

@ginosubscriptions if you switched to GSM you will need to configure + rerun the terraform - did you do that?

ginosubscriptions commented 3 years ago

@ginosubscriptions if you switched to GSM you will need to configure + rerun the terraform - did you do that?

Hi @jstrachan, I was actually successful using the latest and greatest version (just restart from scratch and used all latest release versions) on GKE. Everything works, and I have my repeatable procedure for GKE now. Thanks so much for that.

I'm now stuck repeating this on Minikube, where although jx create spring successfully completes, it does not create an application... I'll see if I can find any issue related to that. Otherwise, I'll open a new GitHub issue for Minikube on that one.

Thanks again.

jstrachan commented 3 years ago

@ginosubscriptions is it a webhooks issue? did you setup ngrok on minikube? were webhooks delivered correctly to trigger pipelines? https://jenkins-x.io/v3/admin/platforms/on-premise/webhooks/

ginosubscriptions commented 3 years ago

@ginosubscriptions is it a webhooks issue? did you setup ngrok on minikube? were webhooks delivered correctly to trigger pipelines? https://jenkins-x.io/v3/admin/platforms/on-premise/webhooks/

@jstrachan I think I did properly set up webhooks, as the created projects get the correct DDNS URL in their webhooks settings and I can reach (200 response) the URL [DDNS]/hook. That being said, I did not use ngrok to do this, as I have my free subscription to another DDNS. To be 100% certain that I'm in a reference environment I'll set up ngrok first and if I get the same problem, I'll let you know. Thanks!

ginosubscriptions commented 3 years ago

@jstrachan, I tried with ngrok, but due to my fairly nested development environment setup and the fact that ngrok doesn't actually allow for a fixed DNS name (as long as you don't pay), I prefer to use my own (NoIP) dynamic DNS mechanism (which I already have subscribed to). I did use ngrok, but funnily enough, I got better results using NoIP DDNS with NodePort. More specifically, I changed the svc/hook to be of type NodePort on a specific port and I was certainly able to access it (using browser => 200 response). Using the procedure set forth in https://jenkins-x.io/v3/admin/platforms/on-premise/webhooks/ I indeed configured the ingress.customHosts.hook key in the charts/jenkins-x/jxboot-helmfile-resources/values.yaml file to point to my DDNS name (which results in creating the hook ingress with that DDNS as host name). However, when creating a new quick start application (e.g. the golang-http example from the tutorial video), nothing happens. The process just hangs on the (cloned) jx3-minikube repository's pull request (saying it's waiting for an approved label => adding that label doesn't change anything). Would you perhaps be able to share a procedure - outside-in (GitHub to Minikube) and inside-out (Minikube to GitHub) - to really be able to test that I correctly configured my webhook? Also, could you (just to be 100% certain that the JX3 install did indeed correctly complete) share what the expected K8s resources are in the Minikube deployment? This might help me to find the root cause of the problem.

Many thanks in advance.

Also, I'm perfectly fine returning the favor by sharing my entire (very) detailed procedure on how to set up a Hyper-V Ubuntu Minikube VM inside of my Windows 10 developer station (I am not using the standard Windows Minikube approach, as this creates problems down the road with the JX3 install). More specifically, for anyone trying to set up a Windows/Minikube full-fledged JX3 development pipeline, maybe I can provide you with a detailed procedure that could help the community? Just let me know if that is of interest to you.