Password disclosure with Jenkins Swarm plugin

paddy-hack commented 7 years ago

I've been following along to set up a Docker swarm of self registering Jenkins agents using your docker-jenkins-slave-dind project as my starting point. I didn't like passing the password on the command-line (trivially retrievable via the ps command) so I had a look at your Docker secrets solution. That doesn't address the ps command issue but at least it's an improvement.

Then I noticed the -passwordEnvVariable option in the agent's CLI documentation. That looked better so I tried that. I had to export the variable after reading it from the Docker secrets to make things work. So far, so good. The disappointment came when I ran the env command in a test job. The password is still trivially retrievable. Do you have any suggestions to improve the password confidentiality?

Personally, I kind of expected the password to be read from the environment variable into (secure) memory by the agent (and kept in memory for as long as necessary) and then immediately unset the environment variable before accepting any jobs.

# I realize this is probably an upstream issue, but I am loathe to create yet another account to report an issue. Apologies if that bothers you. 🙇

vfarcic commented 6 years ago

I don't think there is a solution to the problem without making changes to the plugin itself. At least, I could not come up with a better one than to use a Docker secret which, as you said, exposes the password through ps command.

vfarcic commented 6 years ago

Now, when I think about it (after reading your comment one more time), there might be a better solution. We could use passwordEnvVariable, run the java process in background, unset the env. var., and move the process back to the foreground. I think that would prevent people from discovering the value using env in their jobs. The password would still be discoverable if a) a person manages to enter inside the container and outputs contents of the secret file or b) if he creates a job with a command like cat /run/secrets/PASSWORD_SECRET_FILE. A more secure solution would probably need to involve something like Vault.

What do you think?

paddy-hack commented 6 years ago

Thanks for your replies. I've thought of a "background-unset-foreground" work-around in the mean time as well. I really think the plugin should do this, though. I also realized, as you did, that you can very likely still get at the password via the file in /run/secrets/, definitely when running the Swarm plugin CLI as root (as your Dockerfile and docker-compose files seem to do by default even though you create a jenkins user 🤔).

I run the CLI via an entrypoint as the jenkins user (after adding a group with the same gid as the docker engine on the host and adding jenkins to that group so it can access /var/run/docker.sock). That setup has no problem reading the secret.

I haven't checked anything yet, but perhaps we can remove the /run/secrets/ file once read or drop privileges so it becomes unreadable? Not sure whether the Docker engine will let you or would restore privileges. If the jenkins user cannot do this, my entrypoint or your run.sh command could be run as root and run the CLI as jenkins via a su -c "java ..." jenkins.

WDYT?

paddy-hack commented 6 years ago

[re clearing the password environment variable] I really think the plugin should do this, though.

I had a look at the plugin code and really wanted to do this ⬇️ but it seems Java has its own ideas about programatically changing a process' environment variables (as in: you very likely cannot 😒).

--- a/client/src/main/java/hudson/plugins/swarm/Client.java
+++ b/client/src/main/java/hudson/plugins/swarm/Client.java
@@ -83,6 +83,7 @@ public class Client {
         // password from the env and set as password.
         if (options.passwordEnvVariable != null) {
             options.password = System.getenv(options.passwordEnvVariable);
+            System.unsetenv(options.passwordEnvVariable);
         }

         // Only look up the hostname if we have not already specified

Now, strictly speaking, it is not the swarm CLI process that needs to drop the passwordEnvVariable but the slave-side process that runs the job. Unfortunately, I have no idea how to achieve that, if at all possible.

I've also looked at that "background-unset-foreground" work-around but seem to be running into job control issues courtesy of the busybox shell in the docker base image. I'm not sure, but I think I just may be able to work that out.

perhaps we can remove the /run/secrets/ file

I've tinkered a bit with the /run/secrets/. From within the container, root cannot remove or modify any files there. Changing permissions from the inside does not work either. But none of that is necessary. You can change the owner, group and permissions from the docker-compose.yml file using the long syntax.

So given that, picking suitable owner, group and permissions and running the swarm CLI as a non-root user, you can make the /run/secrets/PASSWORD_SECRET_FILE unreadable. I've had good success with the default owner and group (both root) and changing the permissions from the default 0444 to 0440 or 0400.

If you're interested, I'll see if I can pull all this together in a pair of PRs for docker-jenkins-slave-dind and the jenkins-swarm-agent-secrets file.

vfarcic commented 6 years ago

Pushing PRs sound great!

paddy-hack commented 6 years ago

I just submitted two PRs addressing much, but not all, of this issue. It handles the removal of the password from the ps output and protects the Docker secrets from prying eyes. The latter requires that the swarm-agent runs as a non-root user which is achieved as part of the first PR above.

I have not been successful is removing the environment variable from the swarm-plugin CLI process' context 😭. The run.sh script does not have access to a controlling terminal, preventing access to job control, see this Busybox FAQ. Rather than spending time trying to find a way around that, I think time is better spent fixing the swarm-plugin CLI part so that it removes the environment variable as soon as it is read or, alternatively, keep it out of context of all jobs that the slave runs (whichever is most easily achieved in Java).

vfarcic commented 6 years ago

Have some workshop tomorrow that uses this image. Not enough time to test it before that. I'll merge the code on Friday if that's OK.

paddy-hack commented 6 years ago

No problem.

vfarcic commented 6 years ago

PR worked :). Closing the issue...

paddy-hack commented 6 years ago

Your reverts in 62202380 and 82f86062 effectively nullified what I was trying to achieve, so please reopen this issue. At present, the only thing that is achieved is removing the password from the ps output. It is still very visible in the env output and, if using Docker secrets, cat /run/secrets/jenkins-pass

vfarcic commented 6 years ago

Sorry for the revert. I had a few failure in my own cluster and a few existing users complained that things stopped working. I reverted it only as a temporary measure until we figure out how to make it work in all the cases.

The problem with the PR is that it causes permission issues with artifacts created through containers. Most images are using the root user 1. I think we should change it in a way that security hardening is optional. For example, we can move parts from Dockerfile to run.sh. If an environment variable is set (e.g., SECURE=true|false), it could create the jenkins user with or without putting it into the root group. The same applies to the rest of the changes from this PR.

There might be a better way than what I described.

I guess that the major problem is to make it flexible enough to cover both users with security concerns as well as those who run things as root (e.g., containers), and the existing users.

What do you think?

paddy-hack commented 6 years ago

No need to apologize 😉 If my PRs break stuff without documenting that, they shouldn't be merged or, as you did, (partially) reverted. My point is with the state of this issue. After reverting, this issue should have been reopened. That's all.

I did note in my PRs that ownership of/in the workspace needs to be changed. However, I have noticed permission related problems in my own cluster at the office as well. I am not sure if these are the result of left-overs from earlier experiments or real but for the moment I have:

completely removed the jenkins user from the Docker images that I use
changed the swarm plugin CLI to run as the root user in the container
removed everything below the /workspace/ directory
set things up to use a /workspace/$HOSTNAME/workspace/ for the -fsroot of each replica (because I'm not sure what would happen when different jobs for the same project run concurrently on different agents that share the /workspace/)

At present, I still have very little feedback on how the above works.

In the mean time, I have realized that my setup is probably pretty demanding in terms of permissions. I run a Docker Swarm spread out over two physical servers (identically configured save for their RAID levels and partition sizes). Both physical servers act as manager nodes (I'm looking for a third 😄). A dedicated user, ID 998, runs docker-machine VMs that have been added to the swarm as worker nodes. That same user creates the docker stack that runs the Jenkins agents on all worker nodes. The swarm agent CLI runs as root, ID 0. The docker user and group have ID 999. In addition, the Jenkins jobs run tasks on containers with varying user IDs, typically 0 and 1000 but other IDs are not unheard of (Red Hat based images have a habit of creating the first "normal" user with an ID of 500, IIRC).

Anyway, I agree with your suggestion that parts of the Dockerfile should probably be moved to run.sh to allow for more flexibility, via environment variables (or command-line arguments if used as an entrypoint), as to how things start up. As to making security hardening an opt-in, I respectfully disagree. It should be a cop-out 😜

vfarcic commented 6 years ago

Sorry for that. I missunderstood the initial message. You're right. Issue reopened :)

I also agree that security should be cop-out. However, the problem is that I made a mistake not including it from the start and now there are users who expect certain behavior with certain setup. How about a temporary opt-in with a deprecation notice. After a while, we can change it to cop-out. What do you think?

P.S. I have too much work today so I only skimmed through your message. I'll go through it fully and might post more comments in a day or two. Sorry for that.

paddy-hack commented 6 years ago

Thanks for reopening and no problem at all with not following up right-away (as long as you don't forget and eventually do follow up 😉) because we're all busy. Also thanks for agreeing that security should be cop-out.

In the mean time, I discovered that

fixing the swarm-plugin CLI part so that it [...] keep[s] [the environment variable] out of context of all jobs that the slave runs

is not sufficient because the node's System Info merrily lists it in the Environment Variables section 😒. Hmm, maybe I should just forget about the environment variable approach, switch back to -password and remove execute permissions on the ps command in the container image 🤔. Nope, no good, anyone can still cat /proc/*/cmdline .

More info on user and group IDs:

/var/run/docker.sock is UID 0, GID 1000 inside the (VirtualBox) docker-machines
/workspace/ is UID 1000, GID 50 inside the docker-machines (both 998 on the physical host)

Running everything as root (see https://github.com/vfarcic/docker-flow-stacks/issues/15#issuecomment-352409969) appears to work fine at the office. The only issue I've seen (so far?) is an occasional git fetch (or git config?) error when I hit Build Now in quick succession (with parallel builds enabled). Waiting a second or two between clicks does not trigger the error. Not sure what this is all about, looks like a timeout, could be IO speed related (slave side or git repo side). I don't think it's permission related.

paddy-hack commented 6 years ago

remove execute permissions on the ps command in the container image 🤔

Seeing that /bin/ps is a symlink to /bin/busybox, that would prevent run.sh from doing its job to begin with so the above is a dumb idea. Removing /bin/ps won't help much because you can still simply run busybox ps as part of the job. Besides cat /proc/*/cmdline still remains a viable "attack" vector. No, the Jenkins swarm plugin really needs to remove the environment variable as soon as possible (along the lines in the diff in https://github.com/vfarcic/docker-flow-stacks/issues/15#issuecomment-349285063).

vfarcic commented 6 years ago

I agree.

The only real solution is to fix the problem at its core (the plugin itself). Did you report it to plugins' repo?

paddy-hack commented 6 years ago

No, I'm not sure how to go about that seeing that issues are not enabled for the plugin's GitHub repo.

vfarcic commented 6 years ago

Jenkins issues are managed through JIRA. Those related to swarm-plugin are in https://issues.jenkins-ci.org/browse/JENKINS-48477?jql=project%20%3D%20JENKINS%20AND%20component%20%3D%20swarm-plugin.

paddy-hack commented 6 years ago

Thanks, I found Jenkins-26620. Looks like this is an old issue ... guess I could create an account (what! no hyphens allowed 😢) and chime in. Should also check if that @-mark trick (still) works.

vfarcic commented 6 years ago

That indeed looks like a similar issue. It also looks like one no one cares about.

paddy-hack commented 6 years ago

Bit the bullet, created an account and submitted https://issues.jenkins-ci.org/browse/SECURITY-687.

vfarcic commented 6 years ago

Do you think that we should close this issue and let it take it's course in the one you opened @paddy-hack ?

paddy-hack commented 6 years ago

No, we still have the issue that the files in /run/secrets/ are visible if root runs the swarm-client CLI.

Based on what's in the issue I opened, it looks like the fix will involve a "secret" that the Jenkins master issues and that a "swarm" will have to present. I would put that in a Docker secret (like you do with the password now). Still, if the user that runs the swarm-client has read access to the /run/secrets/ files, a job can get at the secret.

I suggest we keep this as pending and get back to it once the upstream issue is resolved. At present, there is just too much speculation as to how things will be fixed.

vfarcic commented 6 years ago

Agreed

darenjacobs commented 6 years ago

I need jenkins-swarm-agent with java compiler. Can you please point me in the right direction.

vfarcic commented 6 years ago

Sorry for not getting back to you earlier @darenjacobs . I somehow missed the notification...

I'm not sure what you mean by jenkins-swarm-agent. Do you mean https://github.com/vfarcic/docker-jenkins-slave-dind?

srinivasmanthena commented 6 years ago

I ran into an issue where jenkins agent couldn't connect to master if the password has special characters. I posted my issue on stackoverflow, but want to bring to your attention as this could be a new bug. I don't think this is an issue with docker secrets as /run/secrets/ has the correct password mounted to container. here is the stackoverflow https://stackoverflow.com/questions/52768542/docker-secrets-not-working-when-password-has-special-characters

vfarcic commented 6 years ago

Sorry for not getting back to this earlier. I'm traveling and won't be able to go through it before the next week. If you manage to figure out the solution earlier, I'd appreciate a PR.

srinivasmanthena commented 6 years ago

I couldn't figure out where the issue is occurring. I am just a admin, not a DEV, so I am not confident in reading through the jenkins-slave jar java code. Its either docker code that is not reading the special characters or jenkins-slave-jar java code.

vfarcic commented 6 years ago

That does seem to be an issue with Jenkins JNLP agent. If that's the case, I don't think there's much that can be done outside fixing the plugin itself.

vfarcic / docker-flow-stacks

Password disclosure with Jenkins Swarm plugin #15