User: <my_user_name> is not allowed to impersonate root

ridwanoabdulazeez commented 6 years ago

[root@b4b72a9381aa .jupyter]# jupyter enterprisegateway --ip=0.0.0.0 --port_retries=0 [I 2018-10-12 14:49:47.147 EnterpriseGatewayApp] Jupyter Enterprise Gateway 1.1.0 is available at http://0.0.0.0:8888 [W 2018-10-12 14:49:47.147 EnterpriseGatewayApp] Impersonation is enabled and gateway user 'root' is NOT specified in the set ofunauthorized users! Kernels may execute as that user with elevated privileges. [I 181012 14:50:38 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 197.98ms

Starting IPython kernel for Spark in Yarn Cluster mode as user root

eval exec /usr/hdp/current/spark2-client/bin/spark-submit '--master yarn --deploy-mode cluster --queue P_NO_SLA --name ${KERNEL_ID:-ERRORNOKERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/yarn/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda/bin:$PATH' '--proxy-user root' /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' /root/.local/share/jupyter/runtime/kernel-14dfeee9-eed6-4356-a4de-c5b3203d1e0a.json --RemoteProcessProxy.response-address 172.17.0.2:58298 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy ++ exec /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --queue P_NO_SLA --name 14dfeee9-eed6-4356-a4de-c5b3203d1e0a --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/yarn/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=/.local/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip --conf spark.yarn.appMasterEnv.PATH=/bin --proxy-user root /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /root/.local/share/jupyter/runtime/kernel-14dfeee9-eed6-4356-a4de-c5b3203d1e0a.json --RemoteProcessProxy.response-address 172.17.0.2:58298 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy Warning: Could not find the WD Fusion Client jars 18/10/12 14:50:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/12 14:50:40 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/10/12 14:50:41 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 18/10/12 14:50:41 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From b4b72a9381aa/172.17.0.2 to 2 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getClusterMetrics over rm2 after 1 failover attempts. Trying to failover after sleeping for 16126ms. 18/10/12 14:50:57 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 18/10/12 14:50:57 INFO retry.RetryInvocationHandler: org.apache.hadoop.security.authorize.AuthorizationException: User: is not allowed to impersonate root, while invoking ApplicationClientProtocolPBClientImpl.getClusterMetrics over rm1 after 2 failover attempts. Trying to failover after sleeping for 19224ms. [E 2018-10-12 14:51:09.338 EnterpriseGatewayApp] KernelID: '14dfeee9-eed6-4356-a4de-c5b3203d1e0a' launch timeout due to: Application ID is None. Failed to submit a new application to YARN within 30.0 seconds. Check Enterprise Gateway log for more information. [E 181012 14:51:09 web:2106] 500 POST /api/kernels (127.0.0.1) 31032.26ms ^C[I 2018-10-12 14:51:23.301 EnterpriseGatewayApp] Interrupted...

ridwanoabdulazeez commented 6 years ago

@kevin-bates @lresende could you help with this error

kevin-bates commented 6 years ago

Make sure that the user in which Enterprise Gateway is running as is setup as a proxy user.

In addition, it appears that your KERNEL_USERNAME is root based on this entry: --proxy-user root, which is highly unadvised and, in fact, disabled by default per the unauthorized-users default value. So I would strongly advise that you set KERNEL_USERNAME to a regular user as part of your Notebook launch (or client code).

ridwanoabdulazeez commented 6 years ago

Thanks for your prompt response @kevin-bates. Where can I set the KERNEL_USERNAME value? I tried entering my user name in the run.sh file but it doesn't work. Please note that I am running EG in a docker container where root and admin are the only account in the container

kevin-bates commented 6 years ago

KERNEL_USERNAME, as all other envs prefixed with KERNEL_, are automatically sent by NB2KG to the gateway when creating a kernel. So if you're using a Notebook front end, you would set KERNEL_USERNAME in the environment of your Notebook process prior to launching. If you have your own client to manage kernels against EG, you would include KERNEL_USERNAME in the env section of your json payload when submitting the creation request. You can find such an example in our gateway_client code.

Now, with respect to running in a container, I believe this boils down to UID mappings of your host. I would focus on running your container as admin where admin maps to a non-zero UID of your host. This is pushing the boundary of my knowledge of docker and user mappings.

In addition, it looks like the hadoop configuration requirements (referenced in the link I included previously) are based solely on the name. So, assuming you used admin, admin would need to be configured in the core-site.xml file - similar to what they've listed for user oozie. (Note that if admin is local user only defined in your container, then I think you'd configured whatever user maps to that UID in your core-site.xml. Not completely sure about that however.)

As noted in that link, if you're operating in "Secure Mode" then the user associated with the admin UID would need to be a valid kerberos user as well - or at least that's my understanding.

@lresende - does this sound correct to you?

lresende commented 6 years ago

If I understood it correctly, you have Notebook -> docker[enterprise gateway] -> YARN Cluster with kerberos security enabled.

First, you will need to properly configure the image where EG is running to be part of the KDC domain, you will also need to have a authenticated service user with a valid Kerberos token starting the EG process. Impersonated users also need to be valid users, and registered as part of the KDC.

I would recommend to have EG just running in one of the edge machies from your YARN cluster.

More details can be seen at [security features](https://jupyter-enterprise-gateway.readthedocs.io/en/latest/getting-started-security.html#user-impersonation)

ridwanoabdulazeez commented 6 years ago

@kevin-bates this is what i get after mapping the user to the container [I 181012 18:30:11 web:2106] 200 GET /api/kernelspecs (127.0.0.1) 2.57ms

Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user

eval exec /usr/hdp/current/spark2-client/bin/spark-submit '--master yarn --deploy-mode cluster --queue P_NO_SLA --name ${KERNEL_ID:-ERRORNOKERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/yarn/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=${HOME}/.local/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda/bin:$PATH' '' /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py '' /home//.local/share/jupyter/runtime/kernel-d55b9518-8512-4481-8b54-a18494d9c328.json --RemoteProcessProxy.response-address 172.17.0.2:32842 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy ++ exec /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --queue P_NO_SLA --name d55b9518-8512-4481-8b54-a18494d9c328 --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/home/yarn/.local --conf spark.yarn.appMasterEnv.PYTHONPATH=/.local/lib/python2.7/site-packages:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda/bin:/app/anaconda2/bin:/app/anaconda2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/etc/alternatives/jre_1.8.0_openjdk/bin:/etc/alternatives/jre_1.8.0_openjdk/jre/bin:/home//.local/bin:/home//bin /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py /home//.local/share/jupyter/runtime/kernel-d55b9518-8512-4481-8b54-a18494d9c328.json --RemoteProcessProxy.response-address 172.17.0.2:32842 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.spark-context-initialization-mode lazy Warning: Could not find the WD Fusion Client jars [W 2018-10-12 18:30:12.182 EnterpriseGatewayApp] Query for kernel ID 'd55b9518-8512-4481-8b54-a18494d9c328' failed with exception: <class 'requests.exceptions.ConnectionError'> - 'HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?startedTimeBegin=1539369011000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8144a4bdd0>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Continuing... [W 2018-10-12 18:30:12.687 EnterpriseGatewayApp] Query for kernel ID 'd55b9518-8512-4481-8b54-a18494d9c328' failed with exception: <class 'requests.exceptions.ConnectionError'> - 'HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?startedTimeBegin=1539369011000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8144a4b790>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Continuing... 18/10/12 18:30:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [W 2018-10-12 18:30:13.190 EnterpriseGatewayApp] Query for kernel ID 'd55b9518-8512-4481-8b54-a18494d9c328' failed with exception: <class 'requests.exceptions.ConnectionError'> - 'HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?startedTimeBegin=1539369011000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8144a4b710>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Continuing... [W 2018-10-12 18:30:13.694 EnterpriseGatewayApp] Query for kernel ID 'd55b9518-8512-4481-8b54-a18494d9c328' failed with exception: <class 'requests.exceptions.ConnectionError'> - 'HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?startedTimeBegin=1539369011000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8144a4b750>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Continuing... 18/10/12 18:30:14 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. [W 2018-10-12 18:30:14.197 EnterpriseGatewayApp] Query for kernel ID 'd55b9518-8512-4481-8b54-a18494d9c328' failed with exception: <class 'requests.exceptions.ConnectionError'> - 'HTTPConnectionPool(host='localhost', port=8088): Max retries exceeded with url: /ws/v1/cluster/apps?startedTimeBegin=1539369011000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8144a4b650>: Failed to establish a new connection: [Errno 111] Connection refused',))'. Continuing... 18/10/12 18:30:14 INFO yarn.Client: Requesting a new application from cluster with 58 NodeManagers 18/10/12 18:30:14 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (65536 MB per container) 18/10/12 18:30:14 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 18/10/12 18:30:14 INFO yarn.Client: Setting up container launch context for our AM 18/10/12 18:30:14 INFO yarn.Client: Setting up the launch environment for our AM container 18/10/12 18:30:14 INFO yarn.Client: Preparing resources for our AM container

ridwanoabdulazeez commented 6 years ago

@lresende this is the flow, docker where i kinit with my kerberos credential [Notebook ] -> docker[enterprise gateway] i have Eg in the same container as the notebook because I don't have access to the YARN cluster. i also kinit with my kerberos credential-> YARN Cluster with kerberos security enabled

lresende commented 6 years ago

@ridwanoabdulazeez You need to have a chat with the YARN Admin, you can't use your regular user, in order for impersonation to work, the EG user needs to be a service user that can impersonate other users (e.g. user notebook is running EG and then it impersonates userA when userA start a notebook, and it impersonates userB when userB starts a notebook).

kevin-bates commented 6 years ago

One thing that doesn't look right is localhost. I suspect that's coming from your EG_YARN_ENDPOINT value and you probably need to set that to the IP of your host. I believe EG is considering localhost to be itself, when it should be the host - or something like that. 172.17.0.2 is the docker network and YARN workers will need to be able to respond on the response address socket as well.

You may need to use --net host when starting your container so it uses the host's network.

kevin-bates commented 5 years ago

@ridwanoabdulazeez - any update on this?

ridwanoabdulazeez commented 5 years ago

@kevin-bates I will close this issue now. I think we have to have EG installed on the cluster and our admin do not give us the right to do so.

kevin-bates commented 5 years ago

That's unfortunate. Thanks for the update. Closing issue.

jupyter-server / enterprise_gateway

User: <my_user_name> is not allowed to impersonate root #467