Open dominic-p opened 1 year ago
Is there /woodpecker
folder? What's the permissions (ls -Z
for SELinux)? The same questions for /woodpecker/src/git-repo
.
if I override the entrypoint to just get a shell, it drops me in the root dir
Can you create some folder under /woodpecker
manually? Under /woodpecker/src/git-repo
?
Test pipeline below, please:
skip_clone: true
steps:
test:
image: alpine
commands:
- echo Hello from test
Does it work?
Applied AppArmor profile crio-default to container
What if disable AppArmor?
Thanks for looking into this! Ok, I tried using the given test pipeline, and I get the same error. My Pod events look like this (note that I had to switch to the AWS registry as I got rate limited by docker hub):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 13s (x2 over 13s) kubelet Pulling image "public.ecr.aws/docker/library/alpine"
Normal Pulled 13s kubelet Successfully pulled image "public.ecr.aws/docker/library/alpine" in 205ms (205ms including waiting)
Normal Pulled 13s kubelet Successfully pulled image "public.ecr.aws/docker/library/alpine" in 171ms (171ms including waiting)
Warning Failed 12s (x2 over 13s) kubelet Error: container create failed: chdir: No such file or directory
Since the container is failing at such an early stage (the container never gets created), it's really difficult for me to get a shell to check what the file system looks like. The approach I wound up taking was this:
kubectl get -o yaml
spec.workingDir
to /
kubectl debug pod-name -it --copy-to=my-debug --container=container-name -n namespace -- /bin/sh
I wanted to show the process to see if you think this is a valid approach to debugging and also as a reference for me since it took me a while to figure it out. :)
Now that I have a shell on the debug pod I can answer a couple of your questions:
/woodpecker
folder. It is empty./woodpecker
without issues.ls -al
. If I run ls -Z
I get "ls: unrecognized option: Z"# whoami
root
# ls -al /woodpecker
total 8
drwxrwsrwx 2 root root 4096 Oct 5 06:24 .
/
).By the way, if I look at the logs from the completed container it does appear to have run successfully:
+ echo Hello from test
Hello from test
Let me know if there's any other debugging I can do on my end. As far as I can tell the working dir needs to exist before the container is started and on my system it doesn't. I'm not sure what's responsible for creating it, but maybe an initContainer
could be used to run something like mkdir -p /woodpecker/src/git-repo
before the main container is started?
I wanted to show the process
Nice debug approach, thanks for sharing 👍
Could you update System Info
with OS information and add some link to installation manual of CRI-O on that distro?
I tried disabling AppArmor with a Pod annotation
Although I don't think now that AppArmor is the cause of an error, but I meant completely disabling it temporarily. Because when you use local path provisioner with SELinux, there should be some policies and K3S had issues with that.
Besides, I've found similar issue in Podman. Taking into account this bug test, please, pipeline overriding workspace:
workspace:
base: "/woodpecker"
path: "/"
steps:
test:
image: alpine
commands:
- echo Hello from test
It should generate pod like:
containers:
- name: wp-01hbzthp86cx3k762kvyv0007e-0-clone
image: docker.io/woodpeckerci/plugin-git:2.1.1
workingDir: /woodpecker
env:
- name: CI_WORKSPACE
value: /woodpecker
I'm not sure what's responsible for creating it
As I understand correctly, it depends on container runtime. I guess, CRI-O (like Podman) just throws error, but Containerd creates subfolder. BTW, workingDir
is set up here by an Agent.
initContainer could be used to run something like mkdir -p /woodpecker/src/git-repo before the main container is started?
Good idea! Try this one also.
Yahtzee! the workspace
config did the trick. I didn't bother testing an initContainer given that the initial approach worked, but I imagine that would solve the problem as well. I also updated the OP as requested with more system info.
I'm not sure if this should be considered a documentation issue (e.g. CRI-O users need to configure the workspace manually). Or, if some change can/should be made to the kubernetes backend. It seems like out-of-the-box CRI-O support would be nice, but, unfortunately, I'm not a go programmer, so I wouldn't be able to help much with a PR.
Secondary question: Now that I'm unstuck, I'm running into a new error trying to get my buildah container to run. I think it might be due to the securityContext
of the container. I tried configuring it like this (building off of the docs here), but it doesn't seem to be working. If I dump the Pod YAML there is no securityContext
config.
workspace:
base: "/woodpecker"
path: "/"
steps:
# We'll remove this eventually, but for now it's nice just to make sure that the most basic
# pipeline step works
test:
image: public.ecr.aws/docker/library/alpine
commands:
- echo Hello from test
# The real work is done here. Build and push the container image
build:
image: quay.io/buildah/stable:v1.31
commands:
- /bin/sh ./build.sh
backend_options:
kubernetes:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
capabilities:
add: ["SETFCAP"]
Is it not possible to set the security context via the pipeline config? Or, am I just doing it wrong? I can also open a separate issue about this if that would be preferrable.
some change can/should be made to the kubernetes backend
Run InitContainer with mkdir -p $CI_WORKSPACE
? :) As we found a workaround and there are only few users of crun
(maybe you're the first even), I would close the issue. But I would like to mention here Podman does not create working directory from image issue.
if this should be considered a documentation issue (e.g. CRI-O users need to configure the workspace manually)
If you write installation manual, then link in the Posts & tutorials would be right place, I think. Anyway, this issue looks like documentation by itself.
Should be asked in Discussions or Matrix/Discord ;)
error trying to get my buildah container to run
🤣 You're not the first and not the last, I believe. As always, I push people to use kaniko 😄
Is it not possible to set the security context via the pipeline config?
No (almost), only Resources, serviceAccountName, nodeSelector, tolerations and Volumes by now. But you can run in Privileged mode.
Ok, I experimented a bit more buildah and I have a working test implementation running on my cluster. I would just need a couple of additional configuration options to make it work with Woodpecker. I opened #2545 to discuss that.
I have looked at kaniko a few times, but I really like that buildah lets me use a regular shell script to build my container images. I don't want to learn/workaround all of the gotchas that come with the Dockerfile format.
I'll leave this open for the time being in case you do want to implement an initContainer on the clone step. It seems like a good idea to me as it can't really hurt the current user base, and it would make CRI-O work out of the box. But, as you say, this isn't really a huge userbase right now.
For the time being I opened a PR to add a bit of documentation about this to the main docs website.
initContainer on the clone step
The problem is the clone step doesn't always exist. If run init container only for clone step and set skip_clone: true
, then you'll get the same issue in the first step. Should we run init container for all steps?
Or should we consider skip_clone
option?
skip_clone: false
1.1 For init container in clone step set workingDir
to /woodpecker
and create subdirectory src/git-repo
.
1.2 For clone container in clone step set workingDir
to /woodpecker/src/git-repo
.
1.3 Run next steps with workingDir=/woodpecker/src/git-repo
skip_clone: true
2.1. Run all steps with workingDir=/woodpecker
skip_clone: false
1.1 For clone step set workingDir
to /woodpecker
and create subdirectory src/git-repo
by plugin, then clone repo.
1.3 Run next steps with workingDir=/woodpecker/src/git-repo
skip_clone: true
2.1. Run all steps with workingDir=/woodpecker
Then WorkingDir: step.WorkingDir
is one string: in order to implement the first point, we have to supply workspace
instead workingdir
in pod.go
. Probably, not only in Kubernetes backend, but others too.
2.1. Run all steps with
workingDir=/woodpecker
What if I have custom workspace?
workspace:
base: "/woodpecker"
path: "/subdir"
Why does my container run in /woodpecker
, not in /woodpecker/subdir
? - would user's question (and issue here :)
Another concern is the issue with not creation of working dir is not unique for Kubernetes, but also exists in Podman at least. And if we implement Podman backend (PR or FR exists here), should we duplicate that tricky logic?
The best solution would be some option on runc
crun
side (Podman, CRI-O), I believe.
That does get a bit more complicated than I originally thought. That said, at the end of the day the problem is that we need to make sure the workingDir
exists before the Pod is started. If we can't rely on the clone step always running, could we insert some kind of "init" step that does always run? Its job would be to just run mkdir -p $CI_WORKSPACE
. That way we side step all of the complexity of whether or not skip_clone
is set or if the user configured a custom workspace, and we don't tax performance on every step. A similar approach could probably be taken with the Podman backend as well.
I think the problem with implementing a fix at the crun
level is that Podman and CRI-O can also use runC
(I think they do by default on some distros). So, if CRI-O or Podman still choke if the working dir isn't set, adding an option to crun
will only solve the problem for a portion of the userbase.
As I understand, problem is crun
developers intentionally made this. Then did exception for Dockerfile.
What runtime do you use? I use runc
runtime via containerd
. And the vast majority of users on that side, I expect, because I don't see flurry of issues.
adding an option to crun will only solve the problem for a portion of the userbase
Yep, portion that is broken. Another part (runc
) works just fine.
could we insert some kind of "init" step that does always run?
Nothing (almost) is impossible. Let's hear from other developers.
Good point. I am using crun
, so that could be the issue. I'm not sure if it's CRI-O or crun
(or both) that doesn't like the non-existent working dir. If CRI-O doesn't care, and crun
wants to change behavior to match the rest of the ecosystem, bob's your uncle.
It would be a pretty big project for me to configure a cluster to run CRI-O with runC
just to test this, but it might be worth it.
Interesting update from the CRI-O devs here. It appears that this behavior is not by design and they would be open to a PR to fix it. Again, go isn't my thing, so this might be out of reach for me. But, at the very least we know a fix would be welcome.
Component
agent
Describe the bug
I'm trying to get started with the Kubernetes backend (installed via the official helm chart). When I try to run a pipeline, the container gets stuck with CreateContainerError.
After some debugging, it seems like the issue is related to the
spec.workingDir
being set to a directory that doesn't exist at the time that the container is starting. I'm not exactly an expert here, but maybe we could leave the working dir unset and then change directories after the repo is cloned?I can provide more debugging information, if that would be help.
System Info
This is a vanilla Kubernetes 1.28.2 cluster installed via kubeadm. The nodes are Debian 12. The container runtime is CRI-O which was installed following the install guide here.
Additional context
I'm running CRI-O (I'm not sure if that's relevant), and I discussed this issue a bit on the CRI-O repo here.
I'm using Forgejo as my forge. If I comment out the
spec.workingDir
in the Pod yaml and attempt to run the Pod manually I get an error like:I'm not sure if that's another issue or somehow related, but I thought it was worth mentioning.
Validations
next
version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]