moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.33k stars 611 forks source link

Add support --privileged when create service #1030

Open jimmyxian opened 8 years ago

jimmyxian commented 8 years ago

When create some container, we need privileged=True. But swarmkit does not support this. I want to implement this by the following steps:

If it's the right direction, I will submit a PR. :)

Also, there so many container options in docker-engine. If all the options are supported in that way, It's a huge work.Maybe we need have a good way to support all the options?

dmcgowan commented 8 years ago

Privileged containers are needed for any use case which needs to spin up Docker-in-Docker, a common pattern for testing not only docker itself but complex build setups such as allowed by https://github.com/docker/golem. I am looking forward to porting golem to swarm when this is enabled.

stevvooe commented 8 years ago

Also, there so many container options in docker-engine. If all the options are supported in that way, It's a huge work.Maybe we need have a good way to support all the options?

We don't actually want to do this. There are number of things that are hard or impossible to support in a clustered environment. There are other things that are downright insecure, such as privileged, that need to be rethought.

What aspect of privileged containers are you using? For example, what part enables DIND to work?

dmcgowan commented 8 years ago

We don't actually want to do this.

Many people will and it is a common pattern. However I do agree that time should be spent to figure out how to do it right since adding this will mean it can never be removed along with any security headaches it brings.

As for DIND in particular. I am not sure of all the settings that would need to be added to enable this, getting this working I think would cover many use cases for CI. Try running the make test run command without the privileged flag will yield...

mount: permission denied
Could not mount /sys/kernel/security.
AppArmor detection and --privileged mode might break.
mount: permission denied

The reason is obvious from https://github.com/docker/docker/blob/master/hack/dind.

stevvooe commented 8 years ago

@dmcgowan Supporting all of the container and host config options levied a massive amount of complexity on the swarm project for little gain. Each feature we add in services needs to be well-considered and designed for operation in a clustered environment. This is especially true for anything that requires privileged execution. (interesting aspect: ContainerSpec ends up looking a lot like HostConfig container.Config, in an image).

Right now, swarm services aren't really setup well for CI or build use cases. We don't really have the concept of batch jobs and collecting logs is problematic. This will come in the future.

Do we have any online use cases for actual services?

dmcgowan commented 8 years ago

Looking at the container setup code within docker the following is done by setting the privileged flag

stevvooe commented 8 years ago

@dmcgowan Wow, thanks for the overview!

cc @aluzzardi

chanwit commented 8 years ago

Global service fits really well for deploying monitoring agents on every node. And most of those agents require privileged access to some /proc. In my case, I want to monitor conntrack.

Please add support for this privileged flag, cc @aluzzardi

stevvooe commented 8 years ago

@chanwit: when not running with --privileged, what error do you get when trying to monitor conntrack?

aluzzardi commented 8 years ago

@stevvooe @chanwit @jimmyxian I was initially against adding --privileged. Seeing how many requests we got for this, I'm in favor for adding it now

jimmyxian commented 8 years ago

@aluzzardi Thanks, I will rebase the PR. :)

chanwit commented 8 years ago

@stevvooe my bad. it's another issue :-( not relating to privileged.

Anyway I gave up running monitoring agents inside container for now.

sashkachan commented 8 years ago

Trying to run weave-scope using the new swarm, also hitting the roadblock with the privileged mode (or the lack thereof). Would be great to have it supported.

lukemarsden commented 7 years ago

Any update on this @aluzzardi? Would be great to get weave scope working with swarmkit. And we really do need to access /proc. cc @errordeveloper

errordeveloper commented 7 years ago

@alex-glv we have hack that makes it work until this gets resolved: https://github.com/weaveworks/scope-global-swarm-service.

Also, there is a conversation around explicit --capabilities flag: https://github.com/docker/docker/pull/26849#issuecomment-252704844.

thaJeztah commented 7 years ago

Also, related issue in docker/docker https://github.com/docker/docker/issues/24862

alexellis commented 7 years ago

Where are we with this issue now? @aluzzardi - still feeling positive?

marcellodesales commented 7 years ago

:( I need to run Global services please!

stevvooe commented 7 years ago

Note that we are attempting to address this with security profiles, proposed in #1722. The goal is to provide cluster operators much finer grain of access control for security containment without making the model more complex that --privileged.

shankarkc commented 7 years ago

Is there an ETA for this? I am hitting small /dev/shm on docker containers. I have to increase the size. I have used service to create containers. Now as service doesnt allow me to run in privilaged mode, I cannot unmount/remount /dev/shm

thaJeztah commented 7 years ago

@shankarkc would mounting a tmpfs work? Haven't tried, but;

--mount type=tmpfs,target=/dev/shm,.....

Possibly requires docker 1.13 (for tmpfs)

shankarkc commented 7 years ago

Thanks for input. Installed 1.3 It works. On Nov 23, 2016 9:17 PM, "Sebastiaan van Stijn" notifications@github.com wrote:

@shankarkc https://github.com/shankarkc would mounting a tmpfs work? Haven't tried, but;

--mount type=tmpfs,target=/dev/shm,.....

Possibly requires docker 1.13 (for tmpfs)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/docker/swarmkit/issues/1030#issuecomment-262551740, or mute the thread https://github.com/notifications/unsubscribe-auth/AEb7VwTJd6gbf-x7o2Cf5vV8NEGTbMGyks5rBF_zgaJpZM4I5UR9 .

ginjo commented 7 years ago

Curious where we are on the issue of allowing deeper privileges in swarm containers. My use case involves vpn software that requires addition of an ip interface and a couple of routing rules.

Is the focus of this issue now on security profiles https://github.com/docker/swarmkit/pull/1722 , and if so, will they allow deeper privileges in swarm containers?

mikeytag commented 7 years ago

I second @ginjo. I'm trying to get an OpenVPN container up in swarm but can't without privileged, cap_add=NET_ADMIN and device=/dev/net/tun

If anyone has a workaround. I'm all ears.

marcellodesales commented 7 years ago

I have a couple of use cases for the need of running mode=global containers in my cluster:

Any status?

sono-bfio commented 7 years ago

👍 This or the at least the ability to pass in devices in my cases (Working with nvidia GPUs in my case).

AkihiroSuda commented 7 years ago

Until https://github.com/docker/swarmkit/pull/1722 gets merged, a workaround is to create a service with bind-mounting the API socket, and invoke docker run --privileged within a service container.

i.e. docker service create --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock foo docker run --rm --privileged bar baz

ventz commented 7 years ago

Same use case as @mikeytag -- need to add OpenVPN as a service.

zerowebcorp commented 7 years ago

@AkihiroSuda Would you please elaborate how this can be done ?

AkihiroSuda commented 7 years ago

The command I mentioned above is composed of two parts:

  1. docker service create --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock foo
  2. Execute docker run --privileged in the service container created in the step 1

The step 1 creates a service with the access to Docker API socket. So you need to install the docker binary (only client is enough) to the image foo.

In the step 2, you can create privileged containers using the socket.

Note that the privileged programs still does not work on the service. You can just let the service create new containers for the privileged programs. So you might need some additional work about the lifecycle of the containers created.

alexellis commented 7 years ago

@getvivekv

At best this is a temporary work-around ... at worst it's confusing and as @AkihiroSuda mentions it opens a can of worms around managing the privileged containers created. You may be better off with creating your privileged containers manually and then using an attachable overlay network so that you can join them to swarm services.

I.e.

$ docker network create mixed --driver overlay --attachable
c9xwtgk8259j8rnq3xifvt9kv
$ docker service create redis --network=mixed redis
$ docker run --network=mixed -d some_privileged_image_that_uses_redis

+1 x 1000 for --privileged services on Swarm, for Raspberry Pi etc.

Cross-referencing:

https://github.com/docker/docker/issues/25885

https://github.com/docker/swarmkit/pull/1722

stevvooe commented 7 years ago

@alexellis @marcellodesales @getvivekv #1722 is definitely the way forward here. I am not sure what the major concerns are and I have not yet seen a better proposal that provides effective node-level ACL controls of privileged execution.

I also am not sure that the implications of using privileged are broadly understood. For example, the workaround is something you'd not want to run on a manager node, as it may get access to key material. This is why we are generally asking for use case (ie monitoring, gpio, etc.), so thanks for bearing with us. The more narrow we make this, the more secure your containers can be by default.

gerwim commented 7 years ago

@stevvooe Thanks for the heads up. For me, the use case would consist of two topics: NFS and OpenVPN. They either require --privileged or the appropriate --cap-add abilities.

RRAlex commented 7 years ago

Adding --privileged would be a step backward, we really need to re-integrate control over capabilities (and seccomp profiles I'm guessing) on services/swarm.

In some case (OpenVPN and other) it's actually blocking swarm use, in other case, it's leaving too much capabilities in a container that doesn't require all of them and thus making it more exploitable.

My 2¢, but this goes against all the security that docker had achieved and also ruins everyone's effort put into securing their containers and finding what to add or remove. Platform agnosticism should never compromise the ability to use the local security measure of an OS. Worst case, some platform should simply ignore some options.

mikeytag commented 7 years ago

I thought I would add a little more context into our use case with OpenVPN and Swarm to give people much smarter than myself more info into what would be the best way to implement this. I agree with many of the devs on this thread that adding --privileged is the wrong answer and we need something with much more granular control.

We are using an overlay network that our containers all deploy with stacks. This central network allows us to use aliases so that the right containers can connect to others over certain ports. What I need to do occasionally is VPN into our "cluster" and then be able to SSH/connect to ports on other containers via the Swarm network. These are ports that are not exposed to the outside world but just used internally. The awesome auto DNS feature is what I'm hoping to tap into most.

For now, I have a container called vpn which exposes SSH over port 2222 and I connect to it from my laptop/workstation with SSHuttle so I can use it as a jump host of sorts to connect to other containers when needed. This works reasonably well as a poor man's VPN but using OpenVPN would be a lot cleaner.

It helps to not have to give every dev docker-machine exports and certs and such. Instead we have one container that gives them access to whatever other container they need to diagnose and solve problems within specific containers. Running docker attach everywhere is just a lot messier, not to mention we'd be sharing the private keys of the instances created with docker-machine all over the place which I'm not super keen to do.

Anyway, I hope that gives some clarity as to what we are trying to do with OpenVPN and Swarm. I'm sure others have different setups and scenarios. The key here is to keep Docker Swarm secure and scalable with whatever solution is developed.

EDIT: I should add that I would totally do one privileged container run manually on one machine for OpenVPN but I can't run containers like that outside of Swarm and connect them to the overlay network that Swarm is using.

RRAlex commented 7 years ago

@mikeytag if you declare your overlay networks as --attachable, you should be able to run things manually on them from other machines. It's not idea at all, but until capabilities are re-implemented && if your needs are static :disappointed_relieved:

mikeytag commented 7 years ago

Interesting. I wish I would have done that before I had everything running in production! Looks like I might need to schedule some down time late at night in the future to try to make that work.

justintime4tea commented 7 years ago

I know this is the wrong place but I have no idea where to find the answer and I'm sure many of you could point me in the right direction. I'll delete comment afterwords to keep the issue clean. This is related to --cap-add however. How do I regenerate the protobufs in a vendored (swarmkit) "paclage" (wrong term?) ? I am no newb developer by any means but I must admit all this go stuff is making me goooo crazy :) Thanks in advanced.

waltherg commented 7 years ago

@mikeytag @RRAlex I hit the same limitation when attempting to implement the exact same scenario that @mikeytag described (OpenVPN server in overlay network to access ports I do not want to open up to the world).

I attempted testing the --attachable network variant but got an error I can't make sense of:

$ docker network create --driver overlay --attachable test_network --opt encrypted
$ docker service create --name test_webserver --hostname test_webserver --network test_network --restart-condition any nginx
$ docker run -v MY_OPENVPN_VOLUME:/etc/openvpn -d -p 1195:1194/udp --cap-add=NET_ADMIN --network=test_network kylemanna/openvpn

For the docker run invocation I get the following error message:

docker: Error response from daemon: Could not attach to network test_network: context deadline exceeded.

The OpenVPN server is configured as described here (https://github.com/kylemanna/docker-openvpn) and works as expected in the scenario where my Docker network is a single-host bridge network (i.e. non-swarm mode).

Any suggestions on how to proceed here would be greatly appreciated! The Docker server version we run is 1.13.

Update: I stand corrected this actually worked now on my second attempt. No clue what caused the aforementioned error. I still see odd behavior though: The nginx service in my above example has, say IP (virtual IP?) 10.0.0.2 which I can't reach from my work laptop when connected to the OpenVPN server (neither ping nor curl reach that IP). However I can reach the actual container's IP (10.0.0.4) from my laptop. I suppose Docker swarm mode redirects from the service's name / IP to the respective containers' names / IPs for load-balancing and being unable to reach the service's IP from outside may be by design?

waltherg commented 7 years ago

@mikeytag does the sshuttle approach you describe work without --cap-add NET_ADMIN? Do you start the sshuttle container through docker service and host constraint or regularly with docker run and --network NAME_OF_YOUR_SWARM_OVERLAY_NETWORK?

mikeytag commented 7 years ago

@waltherg when I was using sshuttle I was deploying the "vpn" container via stacks but making sure that it was deployed to a specific machine and exposing the port 2222 so the container SSH could respond to the outside world.

I got the OpenVPN setup working too using --attachable. Here's the run command that I use. Maybe there's something in here that's valuable to you or others.

docker run -itd \
  -h vpn-1 \
  --restart=always \
  --name vpn-1 \
  --network=NETWORKNAMEHERE \
  --cap-add=NET_ADMIN \
  --privileged=true \
  -p 1194:1194 \
  --ulimit nofile=65536:65536 \
  YOURFAVORITEOPENVPNIMAGEHERE:latest
dcarastan commented 7 years ago

I ran into it while building a Concourse CI stack. The privileged options is needed for apps that orchestrate containers. Keep this option around and make it a swarm init parameter. Privileged containers should not be accepted by default.

thaJeztah commented 6 years ago

Linking the entitlements proposal here, which is relevant https://github.com/moby/moby/issues/32801

suda commented 6 years ago

Having the same issues with VPN containers as @mikeytag and @waltherg I ended up running them as systemd services. At least this way they get to be supervised. This, combined with Alex's work-around to allow them to talk on the same network seems to do the trick.

I agree that the privileged mode really might be a huge security issue but at least the ability to manage capabilities should give some level of granularity.

dbiswas1 commented 6 years ago

I am having the same issue on launching the Chrome container. chrome container requires the container to be in privileged mode. is anybody here setup chrome in SWARM? i am all ears to hear a solution

shankarkc commented 6 years ago

I have selenium grid running in swarm mode. What is the issue @dbiswas1 ? Can you give more details?

alexellis commented 6 years ago

With your grid were you also able to mount /dev/shm and have it work? This is generally needed for Chrome.

shankarkc commented 6 years ago

yes. Here is sample line docker service create --replicas 1 --name SeleniumHub --env hub_max_memory=2036 --limit-memory 2036M --reserve-memory 2036M --publish 4444:4444 --publish 3000:3000 --mount type=bind,src=/DockerSeleniumGrid/logs,dst=/HubLogs --mount type=tmpfs,dst=/dev/shm,tmpfs-size=1g --network overlayNet --restart-condition any --restart-delay 5s --stop-grace-period 10s registry.mo.sap.corp:5004/ai/hub:0.57

On Fri, Apr 27, 2018 at 5:07 PM, Alex Ellis notifications@github.com wrote:

With your grid were you also able to mount /dev/shm and have it work? This is generally needed for Chrome.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/docker/swarmkit/issues/1030#issuecomment-384945147, or mute the thread https://github.com/notifications/unsubscribe-auth/AEb7V_BB_tK2xOv21VqXu6pmZWDBr5MXks5tswLrgaJpZM4I5UR9 .

mathiasbrito commented 6 years ago

I’ll say just one word on why It should be implemented, IoT, more and more people are considering the use of docker container for IoT scenarios, and we need privileged containers to access attached devices sometimes. Using swarm to manage a set of IoT devices is really interesting, but definitely without privileged devices, it loses a lot of its magic in case of IoT.

olljanat commented 5 years ago

Swarmkit team have made proposal of device support to docker/swarmkit#2682

Please comment to there your thought about if that fits to your use cases.

EDIT: There now there looks to be suggested solution on this message: https://github.com/moby/moby/issues/24862#issuecomment-428308152

wanyvic commented 5 years ago

I found a solution to solve the problem.and I also can use cap_net_admin in swarm mode. you should modify the runtime source code to add the capabilities which you need.(it will be a local default setting). for example I add the CAP_NET_ADMIN to my runtime(used nvidia-container-runtime) wanyvic/nvidia-container-runtime. After that rubuild it. started a container(use swarm mode), input: capsh --print CAP_NET_ADMIN can be found: root@25303a54ebb3:/# capsh --print Current:=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap+eip Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap Securebits: 00/0x0/1'b0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=0(root) gid=0(root) groups= root@25303a54ebb3:/# this method is not good.It also can't to set the cap_add or cap_drop in the docker-compose.yml. but I can't find the other way to solve it.