jenkins-x / jx

Jenkins X provides automated CI+CD for Kubernetes with Preview Environments on Pull Requests using Cloud Native pipelines from Tekton
https://jenkins-x.io/
Apache License 2.0
4.58k stars 788 forks source link

Tekton pipelines not starting #3928

Closed vfarcic closed 5 years ago

vfarcic commented 5 years ago

Summary

On a fresh install of jx in an existing GKE cluster, tekton pipelines are not starting. Judging from logs in the jx namespace, everything seems to be fine and I can see from the GitHub repository that webhooks trigger successfully.

Please let me know which info (e.g., logs, events) can help to debug the issue.

Steps to reproduce the behavior

jx install --provider gke \
    --external-ip $LB_IP \
    --domain acme.com \
    --default-admin-password $JX_PASS \
    --ingress-namespace ingress-nginx \
    --ingress-deployment nginx-ingress-controller \
    --ingress-service ingress-nginx \
    --namespace cd \
    --no-tiller \
    --prow \
    --tekton \
    --batch-mode \
    --verbose

kubectl get pods -w

Expected behavior

The output of kubectl get pods should return Pods used by Tekton pipeline runs.

Actual behavior

The output of kubectl get pods does NOT return Pods used by Tekton pipeline runs.

Jx version

NAME               VERSION
jx                 2.0.118
jenkins x platform 2.0.191
Kubernetes cluster v1.12.6-gke.10
kubectl            v1.14.0
helm client        Client: v2.13.1+g618447c
git                git version 2.20.1 (Apple Git-117)
Operating System   Mac OS X 10.14.4 build 18E226

Jenkins type

Kubernetes cluster

GKE

Operating system / Environment

macOS

heroic commented 5 years ago

@vfarcic do you have pipelinerunner working? I had this issue yesterday. my install was missing pipelinerunner. re-install fixed it.

sharepointoscar commented 5 years ago

@heroic I have the same issue, and reinstall does not fix it unfortunately. In addition, in this version of jx another issue surfaced that actually prevents install. https://github.com/jenkins-x/jx/issues/3345

NAME               VERSION
jx                 2.0.119
Kubernetes cluster v1.11.8-gke.6
kubectl            v1.14.0
git                git version 2.21.0
Operating System   Mac OS X 10.14.3 build 18D42
jstrachan commented 5 years ago

@vfarcic does avoiding passing the LB IP + domain help?

heroic commented 5 years ago

@sharepointoscar try using 2.0.118 to install. That worked for me yesterday. platform version 2.0.276

EamonKeane commented 5 years ago

Having an issue with tekton pipelines too, this may be unrelated but will mention it here. Happens on 2.0.118 and 2.0.119.

NAME VERSION jx 2.0.118 Kubernetes cluster v1.12.7-gke.10 kubectl v1.12.2 git git version 2.17.2 (Apple Git-113) Operating System Mac OS X 10.14.4 build 18E215a

{"level":"error","msg":"failed to apply Tekton CRDs: failed to create/update PipelineResource logistio-environment-jx-te-mast in namespace jx: failed to get PipelineResource logistio-environment-jx-te-mast with pipelineresources.tekton.dev \"logistio-environment-jx-te-mast\" not found after failing to create a new one: Internal error occurred: failed calling admission webhook \"webhook.tekton.dev\": Post https://tekton-pipelines-webhook.jx.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) failed to apply Tekton CRDs: failed to create/update PipelineResource logistio-environment-jx-te-mast in namespace jx: failed to get PipelineResource logistio-environment-jx-te-mast with pipelineresources.tekton.dev \"logistio-environment-jx-te-mast\" not found after failing to create a new one: Internal error occurred: failed calling admission webhook \"webhook.tekton.dev\": Post https://tekton-pipelines-webhook.jx.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)","time":"2019-05-15T12:13:13Z"}

vfarcic commented 5 years ago

@heroic pipelinerunner is running.

That there was an jx installation from a few months ago. I uninstalled it and tried installing it again. My best guess is that there is a left-over from the initial installation that was not removed with jx uninstall. Indeed, left-over CRDs were not removed so I opened #3962. I removed them manually and installed jx again but there is still no sign of tekton Pods being created even though Webhook from the repo is green.

I'll try @jstrachan recommendation to skip specifying LB IP and domain and report back what happens.

vfarcic commented 5 years ago

@jstrachan I installed jx with LB IP and domain and there's still no difference. Is there a way to get a clue what's wrong? Any pointer where to look would be useful.

P.S. I can create a new cluster and avoid those problems but I guess that we'd lose the opportunity to find out what the issue is.

chaoyangnz commented 5 years ago

Have you solved?

Having an issue with tekton pipelines too, this may be unrelated but will mention it here. Happens on 2.0.118 and 2.0.119.

NAME VERSION jx 2.0.118 Kubernetes cluster v1.12.7-gke.10 kubectl v1.12.2 git git version 2.17.2 (Apple Git-113) Operating System Mac OS X 10.14.4 build 18E215a

{"level":"error","msg":"failed to apply Tekton CRDs: failed to create/update PipelineResource logistio-environment-jx-te-mast in namespace jx: failed to get PipelineResource logistio-environment-jx-te-mast with pipelineresources.tekton.dev \"logistio-environment-jx-te-mast\" not found after failing to create a new one: Internal error occurred: failed calling admission webhook \"webhook.tekton.dev\": Post https://tekton-pipelines-webhook.jx.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) failed to apply Tekton CRDs: failed to create/update PipelineResource logistio-environment-jx-te-mast in namespace jx: failed to get PipelineResource logistio-environment-jx-te-mast with pipelineresources.tekton.dev \"logistio-environment-jx-te-mast\" not found after failing to create a new one: Internal error occurred: failed calling admission webhook \"webhook.tekton.dev\": Post https://tekton-pipelines-webhook.jx.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)","time":"2019-05-15T12:13:13Z"}

jstrachan commented 5 years ago

I saw that issue today; bouncing the tekton pods seemed to fix it

EamonKeane commented 5 years ago

@chaoyangnz no, sorry. I didn't have time to debug so stuck with static master jenkins. Let me know if you find a resolution although I haven't tried again recently so it may not be reproduced.

jstrachan commented 5 years ago

@vfarcic if you get issues with pipelines not being triggered there's usually a good error message in the pipelinerunner pod - or worst case maybe something in the pipeline pod

chaoyangnz commented 5 years ago

OK. I raised an issue for that https://github.com/jenkins-x/jx/issues/4385. it is so much frustrating to install Jenkins X these days. See so many open issues and the documentation like kidding, I suppose Jenkins X is far not production-ready. If someone wants to have a try, I definetely discourage them.

@chaoyangnz no, sorry. I didn't have time to debug so stuck with static master jenkins. Let me know if you find a resolution although I haven't tried again recently so it may not be reproduced.

hferentschik commented 5 years ago

@vfarcic is your original problem solved? Trying to figure out what the next steps on this issue would be atm. Are you saying you cannot standup a cluster using Tekton?

vfarcic commented 5 years ago

I haven't experienced the problem for a while now. You can close it unless there are others with the same issue.

hferentschik commented 5 years ago

I'll close it. We can always reopen or even better create a new one with a bit more context.

tpoerio-argo commented 5 years ago

Hi all,

I'm seeing this same issue, at present.

I can open a new ticket, or update here.

I'm also using a GKE cluster.

$ kubectl apply -f task.yaml
Error from server (InternalError): error when creating "task.yaml": Internal error occurred: failed calling admission webhook "webhook.tekton.dev": Post https://tekton-pipelines-webhook.jx.svc:443/?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

If I can provide info to help troubleshoot, let me know.

solsson commented 5 years ago

I'm having the same problem, on a fresh GKE 1.14.6-gke.13 cluster, after applying the 0.7.0 release. Update: same issue after downgrade to 0.6.0. Maybe https://github.com/tektoncd/pipeline/issues/1228 is a duplicate?

solsson commented 5 years ago

Resolved using the fix for port 8443 on GKE private clusters suggested in https://github.com/knative/serving/issues/4868 and documented since https://github.com/kubernetes/kubernetes/issues/79739#issuecomment-529623687