woodpecker-ci / woodpecker

Woodpecker is a simple yet powerful CI/CD engine with great extensibility.
https://woodpecker-ci.org
Apache License 2.0
3.95k stars 351 forks source link

Support "native" podman as backend #85

Open AkiraNorthstar opened 4 years ago

AkiraNorthstar commented 4 years ago

Hello Laszlo! Is it possible to integrate podman in woodpecker?

Podman does not run as a daemon (like docker : /var/run/docker.sock) but is fully compatible with the docker command line.

Another point is that podman can also mount secrets via mount.conf, then you maybe have also solved Global Secrets.

Also cgroups v2 is native supported by podman.

Further information about podman:

laszlocph commented 4 years ago

Technically it is a matter of implementing this interface: https://github.com/laszlocph/woodpecker/blob/master/cncd/pipeline/pipeline/backend/backend.go and this is how it is implemented for Docker: https://github.com/laszlocph/woodpecker/blob/master/cncd/pipeline/pipeline/backend/docker/docker.go

So technically it is probably possible. As for should the project focus on it, is a different question.

In my surroundings I don't see many companies want to differentiate on what container engines they are running, and just default on Docker. Even though the drawbacks of Docker's architecture is known to some of them, adopting alternatives have not reached a high level. Or they don't talk about it, or I don't listen :) Also these companies are not Fedora/RedHat/Centos shops, so I have limited visibility on the adoption of Podman. But I do see the benefits of Podman over Docker. I haven't used it, but I might try.

All in all:

mscherer commented 2 years ago

So I tried to use podman with woodpecker. It failed in several interesting ways. I used podman on f34, either the package (quite recent 3.3.0), or the latest git devel version (self compiled).

It failed first because podman didn't pull the plugins/git image, and I do not understand why as it work if I do it manually. Once the plugin is in the local registry, the build proceed.

Then it failed because the workingDir directory is not created automatically. I see the volume creation, and the code should create it on disk in /var/lib/something, but it doesn't seems to mount it correctly. Again, it seems to be dependent on the backend, i tried with crun and runc, no luck. My plan was to debug that, but I haven't yet looked more.

However, I also looked at what it would requires to write a backend. Podman has a set of bindings, so this shouldn't have been too hard for a go beginner like me. However, since woodpecker has switched to vendoring in 75513575be , my understanding is that we also have to vendor podman, which in turn pull a rather large number of dependencies, some who requires C headers (btrfs, devicemapper). I do think this would put a undue burden on woodpecker CI and deployment.

So for now, the solution for using podman is either find why woodpecker work with docker but not the docker API exposed by podman (likely a issue podman side), or add a backend in a way that do not bloat the build (again, likely a issue on podman binding, or maybe I do not it wrong ).

anbraten commented 2 years ago

That sounds quite interesting. I never used podman, but as I am totally interested in a Kubernetes agent, it is probably a good discussion about how backends for agent should be handled in the long term. Currently I see two options for it:

Meanwhile if you need help debugging woodpecker feel free to write me on Discord.

mfulz commented 2 years ago

That sounds quite interesting. I never used podman, but as I am totally interested in a Kubernetes agent, it is probably a good discussion about how backends for agent should be handled in the long term. Currently I see two options for it:

  • separate agents per backend
  • one agent which includes every supported backend

Meanwhile if you need help debugging woodpecker feel free to write me on Discord.

I've just used a --use-podman boolean flag for the agent for now. But I think in the actual Implementation it would make sense to use one agent with multiple backends and implementing some flag like --backend with a simple switch...case inside the agent.

Most of the code would be either just duplicated for every agent or it would need a bigger rewrite I guess.

mscherer commented 2 years ago

So I just tried again with podman and it now work.... almost.

I was able to use on a self hosted gitea the following config:

clone:
  git:
    image: docker.io/a6543/test_git_plugin:latest
pipeline:
  prepare-build:
    image: quay.io/fedora/fedora:latest
    commands:
      - sudo dnf install -y zola
      - zola build --drafts

The server run on Fedora 34 with podman-3.3.1-1.fc34.x86_64, with woodpecker 34cfabb5 So the issue of not downloading the image is however still here, but there is a workaround unlike the working dir one.

mscherer commented 2 years ago

Ok so it worked because I patched podman myself, and forgot about it. Upgrading to 3.4.0 resulted in the same workingDir issue

mscherer commented 2 years ago

Status update, I managed to get podman working (for real this time), and the patch was merged upstream, and will be in the next release. Now the only last issue is one around registry. It seems for some reason, the docker compat API of podman will not download remote images automatically, but if you download manually before (using podman pull on the agent), it work without problem, so this one can be worked around more easily.

6543 commented 2 years ago

@mscherer nice to hear - so do we need #305 or do work podman with compatiblity layer with podman >= v3.4.3 (upcomming release) ?

mscherer commented 2 years ago

I think it will work with newer podman, but I would appreciate someone to do a test, since I already managed to get my test wrong. I also wrote https://github.com/mscherer/podman/commit/bf4a6b99b9cccb3e00ef498e4a49e7b6e56d4b2c to fix the 2nd problem I faced, but I need to research a bit more the code of docker to make sure I replicated the behavior correctly.

6543 commented 2 years ago

looking forward do see a pull for the 2nd issue :)

I'll think we should still add podman as "native" option too - but what ever get released first will do it for the majority i think

mscherer commented 2 years ago

I am sure upstream podman would also prefer a native API, but it seemed to pull a rather high amount of dependencies and code, even by go standard of vendoring due to the web of import in podman code base (the PR changed 2430 files, and current vendor directory is 2794 files). In turn, this will make the compilation time increase, and the binary size too (like, it almost doubled when I tried to make a mock podman backend before #305 was proposed).

Nothing unfixable, but this look like a rather tedious problem to solve. Last time I looked, it was because the specgen package from podman pull the whole world, so that's a issue podman side to be solved, and afaik, one that wasn't reported yet.

mscherer commented 2 years ago

Here is the PR for the last problem I encountered: https://github.com/containers/podman/pull/12315

mscherer commented 2 years ago

So the PR was wrong, but this is being worked on https://github.com/containers/podman/pull/12318 and https://github.com/containers/podman/pull/12317 .

In the mean time, I also found another incompatibility on https://github.com/containers/podman/issues/12320 (which might be trickier to solve)

mscherer commented 2 years ago

So, after discussion with the podman devs, we reached a consensus that the compatibility issue is tricky to solve, but it should be fixed. However, my fix is not the right one, and as I do not think there is no easy fix, it might take some time before it get done.

In the mean time, there is a few way that can be explored on woodpecker side:

(to be clear, the last one is not a serious proposal)

9p4 commented 2 years ago

Ok, it seems like using Podman rootless works just fine for me. What I did was manually link docker.sock to the user's podman socket and use a version of Podman>=3.4.3 (because of a bug that would not create the paths in storage https://github.com/containers/podman/issues/11842).

mscherer commented 2 years ago

Podman 4.0 was released yesterday, (announce is not yet published, and will likely not be until next week), and it should now fix the short name issue I faced, so that would be working out of the box. I will do a quick test later.

9p4 commented 2 years ago

With PR https://github.com/woodpecker-ci/woodpecker/issues/763, setting the DOCKER_SOCK variable to a Podman socket works now (so no more linking required).

mscherer commented 2 years ago

So as Fedora 36 is out very soon, so I upgraded my CI and I can confirm that podman 4.0 work out of the box if the socket is correctly set (eg, either with the variable, or by using the rpm podman-docker or equivalent).

I guess we can close that issue after adding some documentation, I guess ?

6543 commented 2 years ago

docker compatibility mode works now ... -> #901 - so renamed the issue

NewRedsquare commented 2 years ago

Can someone send an example ? i can't get it working using both commands :

{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not kill container '0_333181564417296953_clone'"}
{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not remove container '0_333181564417296953_clone'"}
{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not kill container '0_333181564417296953_stage_0'"}
{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not remove container '0_333181564417296953_stage_0'"}
{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not remove volume '0_333181564417296953_default'"}
{"level":"error","error":"Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"could not remove network '0_333181564417296953_default'"}
{"level":"error","error":"rpc error: code = Unknown desc = Proc finished with exitcode 1, Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"grpc error: wait(): code: Unknown: rpc error: code = Unknown desc = Proc finished with exitcode 1, Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?"}
{"level":"warn","repo":"romain/cv","build":"23","id":"73","error":"rpc error: code = Unknown desc = Proc finished with exitcode 1, Cannot connect to the Docker daemon at unix:///run/user/1000/podman/podman.sock. Is the docker daemon running?","time":"2022-06-05T17:31:42Z","message":"cancel signal received"}

but : curl -s --unix-socket /run/user/1000/podman/podman.sock http://d/v1.0.0/libpod/info | jq gives me normal output

"level":"error","error":"Error response from daemon: can only kill running containers. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx is in state created: container state improper","time":"2022-06-05T17:35:10Z","message":"could not kill container '0_5049337694948908527_clone'"}
{"level":"error","error":"Error response from daemon: no container with name or ID \"0_5049337694948908527_step_0\" found: no such container","time":"2022-06-05T17:35:10Z","message":"could not kill container '0_5049337694948908527_stage_0'"}
{"level":"error","error":"Error: No such container: 0_5049337694948908527_step_0","time":"2022-06-05T17:35:10Z","message":"could not remove container '0_5049337694948908527_stage_0'"}
{"level":"error","error":"rpc error: code = Unknown desc = Proc finished with exitcode 1, Error response from daemon: \"slirp4netns\" is not supported: invalid network mode","time":"2022-06-05T17:35:10Z","message":"grpc error: wait(): code: Unknown: rpc error: code = Unknown desc = Proc finished with exitcode 1, Error response from daemon: \"slirp4netns\" is not supported: invalid network mode"}
{"level":"warn","repo":"romain/cv","build":"25","id":"79","error":"rpc error: code = Unknown desc = Proc finished with exitcode 1, Error response from daemon: \"slirp4netns\" is not supported: invalid network mode","time":"2022-06-05T17:35:10Z","message":"cancel signal received"}

i'm kinda lost

9p4 commented 2 years ago

I think your DOCKER_HOST has to be unix:///var/user/1000/podman/podman.sock

major137 commented 1 year ago

Can someone send an example ? i can't get it working using both commands :

* `podman container run --privileged --rm --tty -v /run/user/1000/podman/podman.sock:/var/user/1000/podman/podman.sock -e WOODPECKER_SERVER=IP -e WOODPECKER_AGENT_SECRET=xxxxxxxxxx -e WOODPECKER_BACKEND=docker -e DOCKER_HOST=unix:///run/user/1000/podman/podman.sock --network=host docker.io/woodpeckerci/woodpecker-agent:latest`

If -v /run/user/1000/podman/podman.sock:/**var**/user/1000/podman/podman.sock, DOCKER_HOST should be /**var**/user/1000/podman/podman.sock not /**run**/user/1000/podman/podman.sock

I managed to make it work on Fedora 38 with the latest version of podman in rootless mode:

Taywee commented 10 months ago

In case anybody else has issues with the clone step never finishing when using podman as a backend, check this containers/podman#19581

Effectively, set the contents of your containers.conf (in my case, it's ~/.config/containers/containers.conf) to

[containers]
log_driver="json-file"

[engine]
events_logger="file"

I guess it was reporting events to a different location than the podman service expected to find them. Now I have woodpecker running entirely through podman! This is important to me, because I need to use CI to run podman build, which I couldn't get working in a docker container, no matter what I tried. I also am really happy to not have to run a rootful container service to run my woodpecker agent.

With the above config, I was able to run woodpecker with a Podman backend, but in order to run podman build in podman, I needed also:

[containers]
label=false
devices=["/dev/fuse"]
default_capabilities = [
    "CHOWN",
    "DAC_OVERRIDE",
    "FOWNER",
    "FSETID",
    "KILL",
    "NET_BIND_SERVICE",
    "SETFCAP",
    "SETGID",
    "SETPCAP",
    "SETUID",
    "SYS_CHROOT",
    "SYS_ADMIN",
    "MKNOD",
]

This allows podman (buildah, I guess?) to do the the stuff that it needs to do to mount and run rootful containers in the podman rootless container. I couldn't get rootless-in-rootless working, even when following the information in this guide, but the issues I was running into might have been specific to the container in question, and I might have been able to fix it with some hackery.

edit: And now socketed podman won't run for me at all, due to some debug logging from conmon. Oh well; I tried.

kaylynb commented 8 months ago

It looks like v2 broke woodpecker-agent running in a container via podman.

In v1.0.5 I regularly get these errors when running pipelines but everything still works fine, including buildx, etc:

Nov 19 12:35:54 alzirr systemd-woodpecker-agent[5908]: {"level":"error","error":"Error response from daemon: can only kill running containers. d4dba47c6b966c981d03ff1cc89f6679fc7bac975c38e8cf7b252e55
bb8b225d is in state exited: container state improper","time":"2023-11-19T20:35:54Z","message":"could not kill container 'wp_01hfmmqea1258g2a1c93tz73h2_0_stage_0'"}

But in v2.0.0 & latest HEAD (237b2257f5633374a3b28babb2a2d5eef8b30b50 at time of this post) it seems to actually fail on this error now:

Dec 01 11:51:17 alzirr systemd-woodpecker-agent2[8964]: {"level":"error","error":"rpc error: code = Unknown desc = Step finished with exit code 1, Error response from daemon: can only kill running co
ntainers. 6231ebb6daff67daebbe56144ea658041a087a08d1aa8d9c185901d894b63c35 is in state exited: container state improper","time":"2023-12-01T11:51:17-08:00","message":"grpc error: wait(): code: Unknow
n: rpc error: code = Unknown desc = Step finished with exit code 1, Error response from daemon: can only kill running containers. 6231ebb6daff67daebbe56144ea658041a087a08d1aa8d9c185901d894b63c35 is i
n state exited: container state improper"}
Dec 01 11:51:17 alzirr systemd-woodpecker-agent2[8964]: {"level":"warn","repo":"jaam/skeetcrawl","pipeline":"0","id":"40","error":"rpc error: code = Unknown desc = Step finished with exit code 1, Err
or response from daemon: can only kill running containers. 6231ebb6daff67daebbe56144ea658041a087a08d1aa8d9c185901d894b63c35 is in state exited: container state improper","time":"2023-12-01T11:51:17-0
8:00","message":"cancel signal received"}

I didn't have the time to look through the large diff between v1 & v2 but it does appear that this error should actually be caught here: https://github.com/woodpecker-ci/woodpecker/blob/237b2257f5633374a3b28babb2a2d5eef8b30b50/pipeline/backend/docker/docker.go#L361-L366