Better documentation for docker builds

lbernick commented 1 year ago

When running docker builds, there are 2 main ways for task authors to provide docker daemons to their build containers:

Using a sidecar. This is documented in our examples, our sidecar docs, and in the catalog, although the catalog task is broken. We could do a better job of explaining how the sidecar can communicate with the steps (options are via the network or by mounting the socket onto both containers) and how to use TLS for the network option. We could also improve documentation on debugging docker build tasks with sidecars, since the logs will default to the step container, saying something like "Cannot connect to the Docker daemon at tcp://localhost:2376. Is the docker daemon running?", when the real issue may be in the sidecar container. Lastly, we should explain that any docker commands that need to share a daemon must happen in the same task, since it's a common mistake to start with multiple tasks and have to rewrite.
Using the host's docker daemon. This is documented in task docs and docs on volume mounts. We explain that this is "very unsafe", but I think more details are needed for cluster operators around the security concerns. Specifically, this is unsafe because the container may be able to write to the host's filesystem and affect other builds running on the same host, and even if a VM is created for each build, the build may be able to affect the pipelines controller. I believe this can be sandboxed using nested virtualization (@skaegi has mentioned using kata containers for this purpose), so we should have more information than just saying it's unsafe and not to do it. We should also explain that doing this requires privileged pods, and how to configure that, and some more detail on how this works on different platforms (e.g. where can you find the docker socket on minikube?).

The biggest problem with option 2 IMO is that the Task author needs to understand how Tekton is being run to know if they can (safely) use it, and they wouldn't be able to port the Task between platforms. I'm not sure whether this is a problem we can address in Tekton or not, and whether it's a valid reason for encouraging use of option 1.

Another useful piece of documentation would be how to write a CI pipeline with Tekton for a codebase that uses docker-compose for integration tests; i.e. how to translate a docker compose integration test into a Task with a sidecar.

Note: I know we typically encourage folks to use kaniko or alternatives, but docker builds are still very common in CI and kaniko isn't a drop-in replacement, so I think providing good documentation for docker builds is important.

tekton-robot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester commented 1 year ago

/remove-lifecycle stale

tektoncd / pipeline

Better documentation for docker builds #6950