tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.5k stars 1.78k forks source link

Phase out overwriting of image's HOME variable. Remove /tekton/home. #2013

Closed ghost closed 2 years ago

ghost commented 4 years ago

Expected Behavior

Images with an expectation that HOME is set to something specific should work as intended.

Actual Behavior

Those images do not work as expected - Pipelines overwrites the HOME variable of an image with /tekton/home.

See https://github.com/tektoncd/pipeline/issues/1836 for the original report on this issue.

So the plan is as follows:

ghost commented 4 years ago

Adding a configmap field to allow the HOME overwrite to be disabled: https://github.com/tektoncd/pipeline/compare/master...sbwsg:opt-out-home-overwrite?expand=1

chanseokoh commented 4 years ago

Thanks for doing this! This has been very annoying for Java people on Knative and GCB, because almost all the time, most Java applications get the genuine, actual user home through a platform-independent way that does not read $HOME. For example, Tekton pipelines run as root, and because the OS declares that the home of the user root is /root, the Maven build system creates a local artifact repository under /root/.m2 on Tekton (and GCB). BTW, the local Maven repository is crucial to cache in the Java world. (Gradle has the same problem for, e.g., ~/.gradle/caches.)

After all, changing the $HOME environment variable for a certain set of processes or shell sessions doesn't alter the fact that the user home of root declared by OS is /root. Some tools and programs (e.g., cd, understandably) make use of $HOME, but most Java applications don't.

Yet, Tekton auth supplies a Docker config file at /tekton/home/.docker/config.json, even if the actual home is /root. Java applications generally cannot find this kind of files based on user home.

That said, I am actually more interested in where Tekton will create the .docker/config.json file when this opt-out flag is set or not set. Will this still be at /tekton/home/.docker/config.json?

But I should say I don't like having to deal with the inconsistency between the actual home and the value of $HOME. As long as the OS says the home is /root, Maven will create /root/.m2. And Java applications expect a Docker config file at /root/.docker/config.json. As a stop-gap, I had to manually override a JVM property to make JVM believe the user home is (previously) /builder/home. And it got broken at some point, so I had to update it to /tekton/home. But these directory names are implementation details that can continue to change in the future and result in another breakage.

chanseokoh commented 4 years ago

That said, I am actually more interested in where Tekton will create the .docker/config.json file when this opt-out flag is set or not set. Will this still be at /tekton/home/.docker/config.json?

What's the answer going forward, when the flag is set or not set?

ghost commented 4 years ago

Yet, Tekton auth supplies a Docker config file at /tekton/home/.docker/config.json, even if the actual home is /root

With this new flag Tekton will no longer interfere with HOME - it will be whatever you expect it to be when the container runs in a Pod.

Previously $HOME would have been set to /tekton/home but now it won't be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn't specify its own HOME.

ghost commented 4 years ago

Moving this issue over to next milestone. In the next release we plan to flip the behaviour of the feature flag so that Tekton no longer overwrites the HOME environment variable by default.

chanseokoh commented 4 years ago

Previously $HOME would have been set to /tekton/home but now it won't be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn't specify its own HOME.

This doesn't seem true. I think we are not ready to flip the disable-home-env-overwrite flag. See #2165.

ghost commented 4 years ago

It sounds like the HOME env var isn't set at all by the image in that case. The go code for dockercreds simply does os.Getenv("HOME") and the absence of it here implies that this variable isn't defined at all. I'll take a look into the creds-init image and see what it defaults to.

Edit: ran the latest creds-init image locally and I do get a HOME var of /root in my env. Not sure yet what's happening. Will keep looking into it.

chanseokoh commented 4 years ago

Edit: ran the latest creds-init image locally and I do get a HOME var of /root in my env. Not sure yet what's happening. Will keep looking into it.

I think creds-init should place the file at the home directories of my task images, not the HOME of the creds-init image. Multiple copies if the home directories are different.

chanseokoh commented 4 years ago

I've given some more thoughts on this core HOME issue. Before we can actually flip disable-home-env-overwrite, I think this entire auth mechanism should be revisited and reworked.

In most contexts (including ours), I think the notion of a user home is really a runtime one. An image may suggest a certain directory as a "home" and a certain UID/user as a running user, but runtime platforms (as well as user configurations) can override the "suggested user". The home will then be "determined" by the user running the container.

For example, If I run an image as root, the home is /root, obviously. If I run it as nobody, the home is often defined as /home/nonexistent (and the directory doesn't physically exist.). If I run it as UID 12345, I actually don't know what I should call "home." (Different apps and OSes may have different notions of "home" in that case?)

This actually matters to me, because I am trying to figure out how to fix a related problem on OpenShift, which runs images as a random UID. Not only on OpenShift, but users can also change the running user on Tekton (e.g., using runAsNonRoot or runAsUser).

So, the following quotes in the Tekton auth doc becomes ambiguous.

In their native form, these secrets are unsuitable for consumption by Git and Docker. For Git, they need to be turned into (some form of) .gitconfig. For Docker, they need to be turned into a ~/.docker/config.json file.

When the Run executes, before steps execute, a ~/.ssh/config will be generated

When this Run executes, before steps execute, a ~/.gitconfig will be generated

When the Run executes, before steps execute, a ~/.docker/config.json will be generated

chanseokoh commented 4 years ago

Another related issue?

[git-source-source-qxt27] 
{"level":"warn",
"ts":1583449549.563559,
"caller":"git/git.go:149",
"msg":"Unexpected error: creating symlink: symlink /tekton/home/.ssh /root/.ssh: file exists"}

This is a warning, so it doesn't break my task at least.

ghost commented 4 years ago

I've written up a design doc covering the solution you proposed @chanseokoh of having a fixed directory. Doc is here: https://docs.google.com/document/d/1SVuDt-SXPHymz41dveSXFSPrx5Z-Wzb9eHliJAyYg4o

I plan to present this to the tekton working group on Wednesday and will update my existing PR with any changes that get agreed on during that call.

ghost commented 4 years ago

Blocked by https://github.com/tektoncd/pipeline/issues/2523 and https://github.com/tektoncd/pipeline/issues/2524

tekton-robot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 4 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2013#issuecomment-673989128): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
chanseokoh commented 4 years ago

/remove-lifecycle rotten

ghost commented 4 years ago

/reopen /lifecycle frozen

tekton-robot commented 4 years ago

@sbwsg: Reopened this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2013#issuecomment-674070315): >/reopen >/lifecycle frozen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vyom-soft commented 4 years ago

Hello, I an new to Tekton, I am facing the problem. How should I over this is there any approach? sample?

"image-digest-exporter-d6kxs] 2020/08/21 08:55:40 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied"

$ tkn version Client version: 0.11.0 Pipeline version: v0.15.2 Triggers version: unknown Thank you Sanjeev

didier-durand commented 4 years ago

Hello,

Same here

"image-digest-exporter-d6kxs] 2020/08/21 08:55:40 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied"

How can this get fixed

Didier

ghost commented 4 years ago

Could you explain a bit about what is broken? What is the Task you're trying to run and what are its Steps? What version of k8s and tekton are you running? What are the PipelineResources you've configured? What does the Pod look like after the TaskRun finishes? What is the incorrect observed behaviour of your PipelineRun or TaskRun?

didier-durand commented 4 years ago

@sbwsg Scott, in the meantime since I wrote the above, I fixed my problem : see my comment in https://github.com/tektoncd/pipeline/issues/2616#issuecomment-700575193. When I replaced 'docker.io' by 'index.docker.io' in the secret creation, the initial credential checks (before building) went ok. Then, I could build and push to Docker Hub. So, my problem is fixed.

ghost commented 4 years ago

OK, thanks for the update!

ghost commented 3 years ago

From owners meeting just now: next step is to loudly announce this upcoming change on the mailing list and slack.

tmshn commented 3 years ago

@sbwsg Let me ask one thing to clarify my concern; do you intended to stop mounting /tekton/home too? Or is it just about overwriting environmental variable?

The current implementation is the latter, but the the issue title "Remove /tekton/home." sounds implying former change to me.

https://github.com/tektoncd/pipeline/blob/f041ef5d644c72e88689e079a8e3ea96718891fc/pkg/pod/pod.go#L112-L117

ghost commented 3 years ago

@tmshn that's a very interesting question! I completely forgot that the issue title includes "Remove /tekton/home." And I didn't include a work item in the list at the top to remove the folder from our codebase.

I expect that we will completely remove the /tekton/home directory with the feature flag and environment variable on or after Feb 2022. However this is open to discussion - if there are critical reasons to keep the directory we should make sure we are aware of them. What factors are motivating your concerns?

tmshn commented 3 years ago

Simply because some of the docker images do not have home directory properly set, and we need to take extra care to examine each images (maybe not so big blocking issue, though).

For example, alpine/git image (used by official git-cli tasks) have nobody user, but its home directory is / where the user does not have write permission.

$ docker run --rm --entrypoint cat docker.io/alpine/git:v2.26.2@sha256:23618034b0be9205d9cc0846eb711b12ba4c9b468efdd8a59aac1d7b1a23363f /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin

https://github.com/tektoncd/catalog/blob/4bf19372b791f1d9cd4c764ee2055ebfa7cdbd3f/task/git-cli/0.2/git-cli.yaml#L50

ghost commented 3 years ago

Ahh I see. We will need to audit the catalog tasks that use our images to make sure we don't break any existing. For those tasks which would be broken we can manually add a volumeMount with an emptyDir at /tekton/home?

tmshn commented 3 years ago

That sound fine!