Open paddy-hack opened 6 years ago
I don't think there is a solution to the problem without making changes to the plugin itself. At least, I could not come up with a better one than to use a Docker secret which, as you said, exposes the password through ps
command.
Now, when I think about it (after reading your comment one more time), there might be a better solution. We could use passwordEnvVariable
, run the java
process in background, unset the env. var., and move the process back to the foreground. I think that would prevent people from discovering the value using env
in their jobs. The password would still be discoverable if a) a person manages to enter inside the container and outputs contents of the secret file or b) if he creates a job with a command like cat /run/secrets/PASSWORD_SECRET_FILE
. A more secure solution would probably need to involve something like Vault.
What do you think?
Thanks for your replies. I've thought of a "background-unset-foreground" work-around in the mean time as well. I really think the plugin should do this, though. I also realized, as you did, that you can very likely still get at the password via the file in /run/secrets/
, definitely when running the Swarm plugin CLI as root
(as your Dockerfile
and docker-compose
files seem to do by default even though you create a jenkins
user π€).
I run the CLI via an entrypoint as the jenkins
user (after adding a group with the same gid
as the docker
engine on the host and adding jenkins
to that group so it can access /var/run/docker.sock
). That setup has no problem reading the secret.
I haven't checked anything yet, but perhaps we can remove the /run/secrets/
file once read or drop privileges so it becomes unreadable? Not sure whether the Docker engine will let you or would restore privileges. If the jenkins
user cannot do this, my entrypoint or your run.sh
command could be run as root
and run the CLI as jenkins
via a su -c "java ..." jenkins
.
WDYT?
[re clearing the password environment variable] I really think the plugin should do this, though.
I had a look at the plugin code and really wanted to do this β¬οΈ but it seems Java has its own ideas about programatically changing a process' environment variables (as in: you very likely cannot π).
--- a/client/src/main/java/hudson/plugins/swarm/Client.java
+++ b/client/src/main/java/hudson/plugins/swarm/Client.java
@@ -83,6 +83,7 @@ public class Client {
// password from the env and set as password.
if (options.passwordEnvVariable != null) {
options.password = System.getenv(options.passwordEnvVariable);
+ System.unsetenv(options.passwordEnvVariable);
}
// Only look up the hostname if we have not already specified
Now, strictly speaking, it is not the swarm CLI process that needs to drop the passwordEnvVariable
but the slave-side process that runs the job. Unfortunately, I have no idea how to achieve that, if at all possible.
I've also looked at that "background-unset-foreground" work-around but seem to be running into job control issues courtesy of the busybox
shell in the docker
base image. I'm not sure, but I think I just may be able to work that out.
perhaps we can remove the
/run/secrets/
file
I've tinkered a bit with the /run/secrets/
. From within the container, root
cannot remove or modify any files there. Changing permissions from the inside does not work either. But none of that is necessary. You can change the owner, group and permissions from the docker-compose.yml
file using the long syntax.
So given that, picking suitable owner, group and permissions and running the swarm CLI as a non-root user, you can make the /run/secrets/PASSWORD_SECRET_FILE
unreadable. I've had good success with the default owner and group (both root
) and changing the permissions from the default 0444
to 0440
or 0400
.
If you're interested, I'll see if I can pull all this together in a pair of PRs for docker-jenkins-slave-dind and the jenkins-swarm-agent-secrets file.
Pushing PRs sound great!
I just submitted two PRs addressing much, but not all, of this issue. It handles the removal of the password from the ps
output and protects the Docker secrets from prying eyes. The latter requires that the swarm-agent runs as a non-root user which is achieved as part of the first PR above.
I have not been successful is removing the environment variable from the swarm-plugin CLI process' context π. The run.sh
script does not have access to a controlling terminal, preventing access to job control, see this Busybox FAQ. Rather than spending time trying to find a way around that, I think time is better spent fixing the swarm-plugin CLI part so that it removes the environment variable as soon as it is read or, alternatively, keep it out of context of all jobs that the slave runs (whichever is most easily achieved in Java).
Have some workshop tomorrow that uses this image. Not enough time to test it before that. I'll merge the code on Friday if that's OK.
No problem.
PR worked :). Closing the issue...
Your reverts in 62202380 and 82f86062 effectively nullified what I was trying to achieve, so please reopen this issue. At present, the only thing that is achieved is removing the password from the ps
output. It is still very visible in the env
output and, if using Docker secrets, cat /run/secrets/jenkins-pass
Sorry for the revert. I had a few failure in my own cluster and a few existing users complained that things stopped working. I reverted it only as a temporary measure until we figure out how to make it work in all the cases.
The problem with the PR is that it causes permission issues with artifacts created through containers. Most images are using the root user 1. I think we should change it in a way that security hardening is optional. For example, we can move parts from Dockerfile to run.sh
. If an environment variable is set (e.g., SECURE=true|false
), it could create the jenkins
user with or without putting it into the root
group. The same applies to the rest of the changes from this PR.
There might be a better way than what I described.
I guess that the major problem is to make it flexible enough to cover both users with security concerns as well as those who run things as root (e.g., containers), and the existing users.
What do you think?
No need to apologize π If my PRs break stuff without documenting that, they shouldn't be merged or, as you did, (partially) reverted. My point is with the state of this issue. After reverting, this issue should have been reopened. That's all.
I did note in my PRs that ownership of/in the workspace needs to be changed. However, I have noticed permission related problems in my own cluster at the office as well. I am not sure if these are the result of left-overs from earlier experiments or real but for the moment I have:
jenkins
user from the Docker images that I useroot
user in the container/workspace/
directory/workspace/$HOSTNAME/workspace/
for the -fsroot
of each replica (because I'm not sure what would happen when different jobs for the same project run concurrently on different agents that share the /workspace/
)At present, I still have very little feedback on how the above works.
In the mean time, I have realized that my setup is probably pretty demanding in terms of permissions. I run a Docker Swarm spread out over two physical servers (identically configured save for their RAID levels and partition sizes). Both physical servers act as manager nodes (I'm looking for a third π). A dedicated user, ID 998, runs docker-machine
VMs that have been added to the swarm as worker nodes. That same user creates the docker stack
that runs the Jenkins agents on all worker nodes. The swarm agent CLI runs as root
, ID 0. The docker
user and group have ID 999. In addition, the Jenkins jobs run tasks on containers with varying user IDs, typically 0 and 1000 but other IDs are not unheard of (Red Hat based images have a habit of creating the first "normal" user with an ID of 500, IIRC).
Anyway, I agree with your suggestion that parts of the Dockerfile
should probably be moved to run.sh
to allow for more flexibility, via environment variables (or command-line arguments if used as an entrypoint), as to how things start up. As to making security hardening an opt-in, I respectfully disagree. It should be a cop-out π
Sorry for that. I missunderstood the initial message. You're right. Issue reopened :)
I also agree that security should be cop-out. However, the problem is that I made a mistake not including it from the start and now there are users who expect certain behavior with certain setup. How about a temporary opt-in with a deprecation notice. After a while, we can change it to cop-out. What do you think?
P.S. I have too much work today so I only skimmed through your message. I'll go through it fully and might post more comments in a day or two. Sorry for that.
Thanks for reopening and no problem at all with not following up right-away (as long as you don't forget and eventually do follow up π) because we're all busy. Also thanks for agreeing that security should be cop-out.
In the mean time, I discovered that
fixing the swarm-plugin CLI part so that it [...] keep[s] [the environment variable] out of context of all jobs that the slave runs
is not sufficient because the node's System Info merrily lists it in the Environment Variables section π. Hmm, maybe I should just forget about the environment variable approach, switch back to -password
and remove execute permissions on the ps
command in the container image π€. Nope, no good, anyone can still cat /proc/*/cmdline
.
More info on user and group IDs:
/var/run/docker.sock
is UID 0, GID 1000 inside the (VirtualBox) docker-machines/workspace/
is UID 1000, GID 50 inside the docker-machines (both 998 on the physical host)Running everything as root
(see https://github.com/vfarcic/docker-flow-stacks/issues/15#issuecomment-352409969) appears to work fine at the office. The only issue I've seen (so far?) is an occasional git fetch
(or git config
?) error when I hit Build Now in quick succession (with parallel builds enabled). Waiting a second or two between clicks does not trigger the error. Not sure what this is all about, looks like a timeout, could be IO speed related (slave side or git repo side). I don't think it's permission related.
remove execute permissions on the ps command in the container image π€
Seeing that /bin/ps
is a symlink to /bin/busybox
, that would prevent run.sh
from doing its job to begin with so the above is a dumb idea. Removing /bin/ps
won't help much because you can still simply run busybox ps
as part of the job. Besides cat /proc/*/cmdline
still remains a viable "attack" vector. No, the Jenkins swarm plugin really needs to remove the environment variable as soon as possible (along the lines in the diff in https://github.com/vfarcic/docker-flow-stacks/issues/15#issuecomment-349285063).
I agree.
The only real solution is to fix the problem at its core (the plugin itself). Did you report it to plugins' repo?
No, I'm not sure how to go about that seeing that issues are not enabled for the plugin's GitHub repo.
Jenkins issues are managed through JIRA. Those related to swarm-plugin are in https://issues.jenkins-ci.org/browse/JENKINS-48477?jql=project%20%3D%20JENKINS%20AND%20component%20%3D%20swarm-plugin.
Thanks, I found Jenkins-26620. Looks like this is an old issue ... guess I could create an account (what! no hyphens allowed π’) and chime in. Should also check if that @
-mark trick (still) works.
That indeed looks like a similar issue. It also looks like one no one cares about.
Bit the bullet, created an account and submitted https://issues.jenkins-ci.org/browse/SECURITY-687.
Do you think that we should close this issue and let it take it's course in the one you opened @paddy-hack ?
No, we still have the issue that the files in /run/secrets/
are visible if root
runs the swarm-client CLI.
Based on what's in the issue I opened, it looks like the fix will involve a "secret" that the Jenkins master issues and that a "swarm" will have to present. I would put that in a Docker secret (like you do with the password now). Still, if the user that runs the swarm-client has read access to the /run/secrets/
files, a job can get at the secret.
I suggest we keep this as pending and get back to it once the upstream issue is resolved. At present, there is just too much speculation as to how things will be fixed.
Agreed
I need jenkins-swarm-agent with java compiler. Can you please point me in the right direction.
Sorry for not getting back to you earlier @darenjacobs . I somehow missed the notification...
I'm not sure what you mean by jenkins-swarm-agent
. Do you mean https://github.com/vfarcic/docker-jenkins-slave-dind?
I ran into an issue where jenkins agent couldn't connect to master if the password has special characters. I posted my issue on stackoverflow, but want to bring to your attention as this could be a new bug. I don't think this is an issue with docker secrets as /run/secrets/ has the correct password mounted to container. here is the stackoverflow https://stackoverflow.com/questions/52768542/docker-secrets-not-working-when-password-has-special-characters
Sorry for not getting back to this earlier. I'm traveling and won't be able to go through it before the next week. If you manage to figure out the solution earlier, I'd appreciate a PR.
I couldn't figure out where the issue is occurring. I am just a admin, not a DEV, so I am not confident in reading through the jenkins-slave jar java code. Its either docker code that is not reading the special characters or jenkins-slave-jar java code.
That does seem to be an issue with Jenkins JNLP agent. If that's the case, I don't think there's much that can be done outside fixing the plugin itself.
I've been following along to set up a Docker swarm of self registering Jenkins agents using your docker-jenkins-slave-dind project as my starting point. I didn't like passing the password on the command-line (trivially retrievable via the
ps
command) so I had a look at your Docker secrets solution. That doesn't address theps
command issue but at least it's an improvement.Then I noticed the
-passwordEnvVariable
option in the agent's CLI documentation. That looked better so I tried that. I had toexport
the variable after reading it from the Docker secrets to make things work. So far, so good. The disappointment came when I ran theenv
command in a test job. The password is still trivially retrievable. Do you have any suggestions to improve the password confidentiality?Personally, I kind of expected the password to be read from the environment variable into (secure) memory by the agent (and kept in memory for as long as necessary) and then immediately
unset
the environment variable before accepting any jobs.# I realize this is probably an upstream issue, but I am loathe to create yet another account to report an issue. Apologies if that bothers you. π