jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
621 stars 222 forks source link

User: <my_user_name> is not allowed to impersonate root #467

Closed ridwanoabdulazeez closed 5 years ago

ridwanoabdulazeez commented 6 years ago

[root@b4b72a9381aa .jupyter]# jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 [I 2018-10-12 14:49:47.147 EnterpriseGatewayApp] Jupyter Enterprise Gateway 1.1.0 is available at http://0.0.0.0:8888 [W 2018-10-12 14:49:47.147 EnterpriseGatewayApp] Impersonation is enabled and gateway user 'root' is NOT specified in the set ofunauthorized users! Kernels may execute as that user with elevated privileges. [I 181012 14:50:38 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 197.98ms

Starting IPython kernel for Spark in Yarn Cluster mode as user root

ridwanoabdulazeez commented 6 years ago

@kevin-bates @lresende could you help with this error

kevin-bates commented 6 years ago

Make sure that the user in which Enterprise Gateway is running as is setup as a proxy user.

In addition, it appears that your KERNEL_USERNAME is root based on this entry: --proxy-user root, which is highly unadvised and, in fact, disabled by default per the unauthorized-users default value. So I would strongly advise that you set KERNEL_USERNAME to a regular user as part of your Notebook launch (or client code).

ridwanoabdulazeez commented 6 years ago

Thanks for your prompt response @kevin-bates. Where can I set the KERNEL_USERNAME value? I tried entering my user name in the run.sh file but it doesn't work. Please note that I am running EG in a docker container where root and admin are the only account in the container

kevin-bates commented 6 years ago

KERNEL_USERNAME, as all other envs prefixed with KERNEL_, are automatically sent by NB2KG to the gateway when creating a kernel. So if you're using a Notebook front end, you would set KERNEL_USERNAME in the environment of your Notebook process prior to launching. If you have your own client to manage kernels against EG, you would include KERNEL_USERNAME in the env section of your json payload when submitting the creation request. You can find such an example in our gateway_client code.

Now, with respect to running in a container, I believe this boils down to UID mappings of your host. I would focus on running your container as admin where admin maps to a non-zero UID of your host. This is pushing the boundary of my knowledge of docker and user mappings.

In addition, it looks like the hadoop configuration requirements (referenced in the link I included previously) are based solely on the name. So, assuming you used admin, admin would need to be configured in the core-site.xml file - similar to what they've listed for user oozie. (Note that if admin is local user only defined in your container, then I think you'd configured whatever user maps to that UID in your core-site.xml. Not completely sure about that however.)

As noted in that link, if you're operating in "Secure Mode" then the user associated with the admin UID would need to be a valid kerberos user as well - or at least that's my understanding.

@lresende - does this sound correct to you?

lresende commented 6 years ago

If I understood it correctly, you have Notebook -> docker[enterprise gateway] -> YARN Cluster with kerberos security enabled.

First, you will need to properly configure the image where EG is running to be part of the KDC domain, you will also need to have a authenticated service user with a valid Kerberos token starting the EG process. Impersonated users also need to be valid users, and registered as part of the KDC.

I would recommend to have EG just running in one of the edge machies from your YARN cluster.

More details can be seen at [security features](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/getting-started-security.html#user-impersonation)

ridwanoabdulazeez commented 6 years ago

@kevin-bates this is what i get after mapping the user to the container [I 181012 18:30:11 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 2.57ms

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user

ridwanoabdulazeez commented 6 years ago

@lresende this is the flow, docker where i kinit with my kerberos credential [Notebook ] -> docker[enterprise gateway] i have Eg in the same container as the notebook because I don't have access to the YARN cluster. i also kinit with my kerberos credential-> YARN Cluster with kerberos security enabled

lresende commented 6 years ago

@ridwanoabdulazeez You need to have a chat with the YARN Admin, you can't use your regular user, in order for impersonation to work, the EG user needs to be a service user that can impersonate other users (e.g. user notebook is running EG and then it impersonates userA when userA start a notebook, and it impersonates userB when userB starts a notebook).

kevin-bates commented 6 years ago

One thing that doesn't look right is localhost. I suspect that's coming from your EG_YARN_ENDPOINT value and you probably need to set that to the IP of your host. I believe EG is considering localhost to be itself, when it should be the host - or something like that. 172.17.0.2 is the docker network and YARN workers will need to be able to respond on the response address socket as well.

You may need to use --net host when starting your container so it uses the host's network.

kevin-bates commented 5 years ago

@ridwanoabdulazeez - any update on this?

ridwanoabdulazeez commented 5 years ago

@kevin-bates I will close this issue now. I think we have to have EG installed on the cluster and our admin do not give us the right to do so.

kevin-bates commented 5 years ago

That's unfortunate. Thanks for the update. Closing issue.