jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
621 stars 221 forks source link

Possible solution to the first FIXME in the docker-compose.yml file. #773

Open buhl opened 4 years ago

buhl commented 4 years ago

https://github.com/jupyter/enterprise_gateway/blob/02a7e0a1e59821b521f72b2f5ac56f21619a6cee/etc/docker/docker-compose.yml#L7 This problem might be solved by creating an entrypoint script like this one I use for the same problem on an alpine linux image

#!/bin/ash
set -e

GNAME=$(stat -c %G /var/run/docker.sock)
if [[ "$GNAME" != "UNKNOWN" ]]; then
    addgroup user $GNAME;
else
    GID=$(stat -c %g /var/run/docker.sock)
    addgroup -g $GID user;
fi
if [[ -z "$@" ]]; then
    su - user -c ash
else
    su - user -c "$@"
fi

That gives the user access to the docker socket

localhost:~$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock dtest:latest  ash
6600c48b7a97:~$ docker ps
CONTAINER ID        IMAGE                   COMMAND             CREATED             STATUS              PORTS                    NAMES
6600c48b7a97        dtest:latest            "/entry.sh ash"     4 seconds ago       Up 3 seconds                                 priceless_colden
6600c48b7a97:~$ id
uid=100(user) gid=101(user) groups=101(user),101(user),999(ping)
6600c48b7a97:~$ 

Heres my test Dockerfile

FROM alpine:latest
RUN apk update
RUN apk add docker-cli
RUN addgroup -S user && adduser -s /bin/ash -S user -G user
ADD entry.sh /
ENTRYPOINT ["/entry.sh"]
CMD ["while true; do id; sleep 2; done"]

If I misunderstood the problem or in any other way missed something I do apologies.

kevin-bates commented 4 years ago

This looks promising. Would you like to contribute a pull request for this?

I would recommend just going after the ID directly, rather than name first, and we'd probably want protection from the file not existing in the first place.

Thank you for opening this issue!

buhl commented 4 years ago

Hi @kevin-bates Sorry for the late reply, I just returned from a vacation. I will try to get some time to fit this solution into a working example for you to look at.

kevin-bates commented 4 years ago

Right on - no worries. Welcome back!

buhl commented 4 years ago

Hi @kevin-bates I made some changes to https://github.com/buhl/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/Dockerfile and https://github.com/buhl/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/start-enterprise-gateway.sh So now the jovyan user is added to the docker group. I have spent the most of two evenings trying to get the enterprise gateway to build and run and I am not all there. I can now start an enterprise gateway with docker-compose up, but I cant seem to get the enterprise gateway to work (I get {"reason": "Not Found", "message": ""} on all requests). However the jovyan user can talk to the docker daemon as demonstrated below:

enterprise_gateway/etc/docker $ docker-compose exec enterprise-gateway  /bin/bash
root@ab7012645bbe:/usr/local/bin# su - jovyan
jovyan@ab7012645bbe:~$ id
uid=1000(jovyan) gid=100(users) groups=100(users),999(docker)
jovyan@ab7012645bbe:~$ ps wwwfaux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
[...]
root         1  0.0  0.0   4520   792 ?        Ss   20:42   0:00 tini -g -- /usr/local/bin/start-enterprise-gateway.sh
root         6  0.0  0.0  11588  3132 ?        S    20:42   0:00 /bin/bash /usr/local/bin/start-enterprise-gateway.sh
root        18  0.0  0.0  50940  3452 ?        S    20:42   0:00  \_ su --preserve-environment jovyan -c /opt/conda/bin/jupyter enterprisegateway .--log-level=DEBUG .--MappingKernelManager.cull_idle_timeout=3600 .--MappingKernelManager.cull_interval=60 .--MappingKernelManager.cull_connected=False
jovyan      19  0.1  0.2  73988 48756 ?        Ss   20:42   0:01      \_ /opt/conda/bin/python /opt/conda/bin/jupyter-enterprisegateway --log-level=DEBUG --MappingKernelManager.cull_idle_timeout=3600 --MappingKernelManager.cull_interval=60 --MappingKernelManager.cull_connected=False
jovyan@ab7012645bbe:~$ curl --unix-socket /var/run/docker.sock http://localhost/containers/json
[{"Id":"ab7012645bbee25a3dfa432050b4b96e1d7e0db80619b6cee7715e7909abd596","Names":["/docker_enterprise-gateway_1"],"Image":"elyra/enterprise-gateway:dev","ImageID":"sha256:4d5551596ca09de98fa6372f613a0ed30b9201f1b2c42f3bd8f8de1c348aa8af","Command":"tini -g -- /usr/local/bin/start-enterprise-gateway.sh","Created":1581972148,"Ports":[{"IP":"0.0.0.0","PrivatePort":8888,"PublicPort":8888,"Type":"tcp"}],"Labels":{"app":"enterprise-gateway","com.docker.compose.config-hash":"82055d9f2565b2df47f89c910fefe89c84321473a215b6b755b92b4ed638108f","com.docker.compose.container-number":"1","com.docker.compose.oneoff":"False","com.docker.compose.project":"docker","com.docker.compose.service":"enterprise-gateway","com.docker.compose.version":"1.24.1","component":"enterprise-gateway","maintainer":"Jupyter Project <jupyter@googlegroups.com>"},"State":"running","Status":"Up 16 minutes","HostConfig":{"NetworkMode":"docker_enterprise-gateway"},"NetworkSettings":{"Networks":{"docker_enterprise-gateway":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"3b4760bd7fd203154baeca0fcfdd71ea048f1b6a9c5fe870a31842ac09188e67","EndpointID":"f6fad93642d0bf5ff39812d90a3234f3693c70df5be4ae6b52d575746bff0620","Gateway":"172.20.0.1","IPAddress":"172.20.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:14:00:02","DriverOpts":null}}},"Mounts":[{"Type":"bind","Source":"/var/run/docker.sock","Destination":"/var/run/docker.sock","Mode":"rw","RW":true,"Propagation":"rprivate"}]}]
jovyan@ab7012645bbe:~$

I feel I have to say that running the service as a not root user in the container. but giving it access to the docker.sock is effectively like giving the jovyan user root on the host machine :) So if this exercise is only about dropping root for no other reason but dropping root it might not be super important.

Well, I don't really know what to do next?

kevin-bates commented 4 years ago

Hi @buhl, sorry for the frustration. I agree, EG is has a non-trivial build.

You do not need to worry about building the demo-base or enterprise-gateway-demo images. Those are purely for demo and YARN integration tests.

The make targets you'll need to invoke are: clean dist enterprise-gateway - the last of which builds the EG image. Target kernel-images builds the various kernel-related images, but those shouldn't have to change for this.

I agree with what you say about root and docker.sock. The requirement for operating in docker environments is that EG be able to start images, query running containers based on labels, etc. (discovery) and stop containers via the docker API. My understanding is that "docker in docker" requires docker.sock and mounting docker.sock requires root. So this may just need to be the way things are. Perhaps we just change the FIXME to a nasty WARNING message. :smile:

At any rate, I'm hoping we can get your build working so you're free to check things out and make contributions.

Regarding runtime experiences... What command are you issuing to produce {"reason": "Not Found", "message": ""}? I usually hit /api/kernelspecs as my litmus test that EG is able to service requests. You should get the JSON for each of the found kernelspecs returned.

If you're going through Notebook, then things could be tied up with incorrect socket in your --gateway-url value or something like that. Does the EG log show anything on each request attempt?

buhl commented 4 years ago

Hi @kevin-bates Great, thanks! I will try with the make targets later this week. I actually got enterprise-gateway to build and run. The /api/kernelspecs enpoint you suggested also seem to work, but I have yet to try and start a notebook. I will try to clean up my branch, revert the unnecessary changes and attempt to submit a pull request.

I had a problem I didn't know how to solve so I had to remove the --KernelSpecManager.whitelist=${EG_KERNEL_WHITELIST} from the jupyter enterprisegateway initialization because a got the error traitlets.traitlets.TraitError: The 'whitelist' trait of a KernelSpecManager instance must be a set, but a value of class 'str' (i.e. '[r_docker,python_docker,python_tf_docker,scala_docker,spark_r_docker,spark_python_docker,spark_scala_docker]') was specified.

kevin-bates commented 4 years ago

ok - yeah, set-based traitlets can be difficult to get their values configured correctly. Looking at the appropriate files, and comparing them to other systems, I believe each of the items must be single-quoted - all of which are in square brackets. Here are a couple of examples that should work:

https://github.com/jupyter/enterprise_gateway/blob/master/etc/kubernetes/enterprise-gateway.yaml#L141 https://github.com/jupyter/enterprise_gateway/blob/master/etc/docker/enterprise-gateway/start-enterprise-gateway.sh#L28

Were you trying to setup EG_KERNEL_WHITELIST with your own set of values? Or are things getting modified before their actual use?