tektoncd / catalog

Catalog of shared Tasks and Pipelines.
Apache License 2.0
656 stars 572 forks source link

Trying to run task docker-build #628

Open johnlongo opened 3 years ago

johnlongo commented 3 years ago

Trying to run task docker-build, but getting the following error: Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?

Below is the task run I ran, should I be using a different docker.io image than the default already defined? Please advise and Thank You in advance.

apiVersion: tekton.dev/v1beta1 kind: TaskRun metadata: name: docker-open-liberty-pet-store-run spec: params:

lbernick commented 2 years ago

/reopen

I am also running into this issue with the following pipeline:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: build-and-push
spec:
  workspaces:
  - name: source-code
  tasks:
  - name: clone
    taskRef:
      name: git-clone
      bundle: gcr.io/tekton-releases/catalog/upstream/git-clone:0.6
    workspaces:
    - name: output
      workspace: source-code
    params:
    - name: url
      value: https://github.com/lbernick/web-app-demo
  - name: build
    taskRef:
      name: docker-build
      bundle: gcr.io/tekton-releases/catalog/upstream/docker-build:0.1
    workspaces:
    - name: source
      workspace: source-code
    params:
    - name: image
      value: gcr.io/leebernick-test/web-app-demo
    runAfter:
    - clone
$ tkn pr describe build-and-push-run -n codelab
Name:              build-and-push-run
Namespace:         codelab
Pipeline Ref:      build-and-push
Service Account:   default
Timeout:           1h0m0s
Labels:
 tekton.dev/pipeline=build-and-push

🌡️  Status

STARTED          DURATION     STATUS
40 seconds ago   30 seconds   Failed

💌 Message

Tasks Completed: 2 (Failed: 1, Cancelled 0), Skipped: 0 ("step-docker-build" exited with code 1 (image: "docker-pullable://docker@sha256:18ff92d3d31725b53fa6633d60bed323effb6d5d4588be7b547078d384e0d4bf"); for logs run: kubectl -n codelab logs build-and-push-run-build-pod -c step-docker-build
)

📦 Resources

 No resources

⚓ Params

 No params

📝 Results

 No results

📂 Workspaces

 NAME            SUB PATH   WORKSPACE BINDING
 ∙ source-code   ---        VolumeClaimTemplate

🗂  Taskruns

 NAME                         TASK NAME   STARTED          DURATION     STATUS
 ∙ build-and-push-run-build   build       22 seconds ago   12 seconds   Failed
 ∙ build-and-push-run-clone   clone       40 seconds ago   17 seconds   Succeeded

⏭️  Skipped Tasks

 No Skipped Tasks

logs:

$ kubectl -n codelab logs build-and-push-run-build-pod -c step-docker-build
Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?

tekton version:

$ tkn version
Client version: 0.21.0
Pipeline version: devel
Triggers version: v0.17.1

k8s version:

$ k version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:31:32Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6-gke.1503", GitCommit:"2c7bbda09a9b7ca78db230e099cf90fe901d3df8", GitTreeState:"clean", BuildDate:"2022-02-18T03:17:45Z", GoVersion:"go1.16.9b7", Compiler:"gc", Platform:"linux/amd64"}

Any owners of this task know what's up? @vdemeester @imjasonh @PuneetPunamiya @popcor255 ? If I'm using incorrectly happy to update docs!

tekton-robot commented 2 years ago

@lbernick: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/tektoncd/catalog/issues/628#issuecomment-1108719795): >/reopen > >I am also running into this issue with the following pipeline: > >``` >apiVersion: tekton.dev/v1beta1 >kind: Pipeline >metadata: > name: build-and-push >spec: > workspaces: > - name: source-code > tasks: > - name: clone > taskRef: > name: git-clone > bundle: gcr.io/tekton-releases/catalog/upstream/git-clone:0.6 > workspaces: > - name: output > workspace: source-code > params: > - name: url > value: https://github.com/lbernick/web-app-demo > - name: build > taskRef: > name: docker-build > bundle: gcr.io/tekton-releases/catalog/upstream/docker-build:0.1 > workspaces: > - name: source > workspace: source-code > params: > - name: image > value: gcr.io/leebernick-test/web-app-demo > runAfter: > - clone >``` > >``` >$ tkn pr describe build-and-push-run -n codelab >Name: build-and-push-run >Namespace: codelab >Pipeline Ref: build-and-push >Service Account: default >Timeout: 1h0m0s >Labels: > tekton.dev/pipeline=build-and-push > >🌡️ Status > >STARTED DURATION STATUS >40 seconds ago 30 seconds Failed > >💌 Message > >Tasks Completed: 2 (Failed: 1, Cancelled 0), Skipped: 0 ("step-docker-build" exited with code 1 (image: "docker-pullable://docker@sha256:18ff92d3d31725b53fa6633d60bed323effb6d5d4588be7b547078d384e0d4bf"); for logs run: kubectl -n codelab logs build-and-push-run-build-pod -c step-docker-build >) > >📦 Resources > > No resources > >⚓ Params > > No params > >📝 Results > > No results > >📂 Workspaces > > NAME SUB PATH WORKSPACE BINDING > ∙ source-code --- VolumeClaimTemplate > >🗂 Taskruns > > NAME TASK NAME STARTED DURATION STATUS > ∙ build-and-push-run-build build 22 seconds ago 12 seconds Failed > ∙ build-and-push-run-clone clone 40 seconds ago 17 seconds Succeeded > >⏭️ Skipped Tasks > > No Skipped Tasks >``` > >logs: >``` >$ kubectl -n codelab logs build-and-push-run-build-pod -c step-docker-build >Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running? >``` > >tekton version: >``` >$ tkn version >Client version: 0.21.0 >Pipeline version: devel >Triggers version: v0.17.1 >``` > >k8s version: >``` >$ k version >Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:31:32Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"darwin/amd64"} >Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6-gke.1503", GitCommit:"2c7bbda09a9b7ca78db230e099cf90fe901d3df8", GitTreeState:"clean", BuildDate:"2022-02-18T03:17:45Z", GoVersion:"go1.16.9b7", Compiler:"gc", Platform:"linux/amd64"} >``` > >Any owners of this task know what's up? @vdemeester @imjasonh @PuneetPunamiya @popcor255 ? >If I'm using incorrectly happy to update docs! Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vdemeester commented 2 years ago

I wonder if it's a race or not. There is a readiness probe but it might be slightly racy as maybe the docker daemon has created the certificates but didn't completely started, especially in slow environment. I feel we should probably try to update the readiness probe comment to validate that the daemon is actually listening on that port.

lbernick commented 2 years ago

I just experimented with modifying the readiness probe, but it looks like Tekton is actually not waiting for the sidecar to be ready before starting the steps; here's some output from kubectl describe po build-and-push-run-build-pod:

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  107s  default-scheduler  Successfully assigned codelab/build-and-push-run-build-pod to gke-test-cluster-default-pool-0a394558-oytf
  Normal   Pulled     106s  kubelet            Container image "gcr.io/leebernick-test/entrypoint-bff0a22da108bc2f16c818c97641a296@sha256:35cd8b74443b2c44ddd9799fcd48fb2e1c97589533985f371088ef3071b69353" already present on machine
  Normal   Created    106s  kubelet            Created container place-tools
  Normal   Started    106s  kubelet            Started container place-tools
  Normal   Pulled     105s  kubelet            Container image "gcr.io/leebernick-test/entrypoint-bff0a22da108bc2f16c818c97641a296@sha256:35cd8b74443b2c44ddd9799fcd48fb2e1c97589533985f371088ef3071b69353" already present on machine
  Normal   Created    105s  kubelet            Created container step-init
  Normal   Started    105s  kubelet            Started container step-init
  Normal   Pulled     104s  kubelet            Container image "ghcr.io/distroless/busybox@sha256:19f02276bf8dbdd62f069b922f10c65262cc34b710eea26ff928129a736be791" already present on machine
  Normal   Created    104s  kubelet            Created container place-scripts
  Normal   Started    104s  kubelet            Started container place-scripts
  Normal   Pulling    103s  kubelet            Pulling image "gcr.io/leebernick-test/workingdirinit-0c558922ec6a1b739e550e349f2d5fc1@sha256:b7e408c98089dae51492106ba63e1498f79ad293768dd16ecbbb9b43c813248b"
  Normal   Pulled     103s  kubelet            Successfully pulled image "gcr.io/leebernick-test/workingdirinit-0c558922ec6a1b739e550e349f2d5fc1@sha256:b7e408c98089dae51492106ba63e1498f79ad293768dd16ecbbb9b43c813248b" in 579.262499ms
  Normal   Created    103s  kubelet            Created container working-dir-initializer
  Normal   Pulling    102s  kubelet            Pulling image "docker.io/library/docker:stable@sha256:18ff92d3d31725b53fa6633d60bed323effb6d5d4588be7b547078d384e0d4bf"
  Normal   Started    102s  kubelet            Started container working-dir-initializer
  Normal   Pulled     98s   kubelet            Successfully pulled image "docker.io/library/docker:stable@sha256:18ff92d3d31725b53fa6633d60bed323effb6d5d4588be7b547078d384e0d4bf" in 4.175542871s
  Normal   Created    96s   kubelet            Created container step-docker-build
  Normal   Started    96s   kubelet            Started container step-docker-build
  Normal   Pulled     96s   kubelet            Container image "docker.io/library/docker:stable@sha256:18ff92d3d31725b53fa6633d60bed323effb6d5d4588be7b547078d384e0d4bf" already present on machine
  Normal   Created    96s   kubelet            Created container step-docker-push
  Normal   Started    96s   kubelet            Started container step-docker-push
  Normal   Pulling    96s   kubelet            Pulling image "docker:dind"
  Normal   Pulled     92s   kubelet            Successfully pulled image "docker:dind" in 3.841685644s
  Normal   Created    90s   kubelet            Created container sidecar-server
  Normal   Started    90s   kubelet            Started container sidecar-server
  Warning  Unhealthy  90s   kubelet            Readiness probe failed:

It looks like the step containers are being started before the sidecar image is even pulled.

vdemeester commented 2 years ago

It looks like the step containers are being started before the sidecar image is even pulled.

Is the underlying process in the step starting as well ? Given how tekton and Pods works, from Kubernetes, sidecars and steps "entrypoint" will start whenever the image is ready. If the image of the sidecar takes more time to pull, from a k8s event perspective, steps are already started, but that doesn't mean they are running. What Tekton should do however for the steps (step 0, as the rest of the steps wait for step 0), is to wait for a file telling it the sidecar is ready.

lbernick commented 2 years ago

Ok, it turns out this is not an issue with the readiness probe, it's an issue with insecure_registry. The default value for insecure_registry is "", which causes the sidecar to exit with failed to start daemon: insecure registry is not valid: invalid host "".

Currently, the sidecar command args are:

      - --storage-driver=vfs
      - --userland-proxy=false
      - --debug
      - --insecure-registry=$(params.insecure_registry)

This task works as expected when the last arg is commented out. FYI @dibyom

lbernick commented 2 years ago

One option could be having a sidecar_extra_args param (similar to the build_extra_args and push_extra_args params). However, replacing insecure_registry with sidecar_extra_args wouldn't be backwards compatible. I'm not sure of a good way to only pass the --insecure-registry arg when the insecure_registry param is set.

lbernick commented 2 years ago

Looks like docker authentication is also not configured correctly in this task. here's a version I hacked together that works with Google Artifact Registry.

tekton-robot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

lbernick commented 2 years ago

/lifecycle frozen