jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

Jupyterhub-Sparkmagic-Livy-Kerberos #527

Closed aleevangelista closed 5 years ago

aleevangelista commented 5 years ago

Is there any example to configure Sparkmagic to use kerberos authentication with Livy? Setting authentication type "Kerberos" in config.json doesn't work. I get a "401" error. But If I open a terminal in the user session in JupyterHub and execute klist command i can see the tgt ticket in ccache. I'm using the default spawner. Should I use another specific custom spawner?

ashkan-leo commented 5 years ago

I want to know how to setup sparkmagic with Kerberos as well. It would be great to see an example.

Tagar commented 5 years ago

See required Livy config lines there https://github.com/apache/incubator-livy/blob/master/conf/livy.conf.template#L152-L157

livy.server.auth.type = kerberos
livy.server.auth.kerberos.principal = HTTP/_HOST@YOUR.COMPANY.COM
livy.server.auth.kerberos.keytab = /some/path/http.keytab1
livy.server.auth.kerberos.name-rules = DEFAULT

you would also need something like

livy.server.launch.kerberos.keytab = /some/path/svc_livy.keytab2
livy.server.launch.kerberos.principal = svc_livy@YOUR.COMPANY.COM

2.

You would also need to update Hadoop side configs to trust Livy's service account (let's say it is svc_livy) to impersonate to other users

E.g. hadoop.proxyuser.svc_livy.groups hadoop.proxyuser.svc_livy.hosts hadoop.proxyuser.http.groups hadoop.proxyuser.http.hosts

Tagar commented 5 years ago

Livy configuration very often outdated and/or missing on many items

You might need to look at Livy source code to get some answers.

For example, livy.server.launch.kerberos... can be only seen in source code

https://github.com/apache/incubator-livy/blob/47d3ee6b6555a63d0b871788a53aab022fff518a/server/src/main/scala/org/apache/livy/LivyConf.scala#L96

Dev/ user email lists are helpful too https://lists.apache.org/list.html?user@livy.apache.org https://lists.apache.org/list.html?dev@livy.apache.org

aleevangelista commented 5 years ago

After a deep analysis I found the reason for error 401. When a user logs on to the target environment using ssh, a credential cache file is created with a specific name structure. Using Kerberos a credential cache is usually created using the following template:

in the target environment the credential cache file name template is:

When a user logs into Jupyterhub and his session starts, no KRB5CCNAME environment variable is defined. In this way, when the user opens a pyspark notebook and execute a command, for example "%%info", sparkmagic tries to authenticate on Livy by accessing the default credential cache file:

This file does not exist. Authentication is not possible. A 401 error is returned. I looked for a solution to read the correct file. I found a possible suggestion by reading https://github.com/jupyter-incubator/sparkmagic/issues/466. So I wrote a custom spawner, extending LocalProcessSpawner, which could retrieve the correct file name to use for user's credential cache and could set the KRB5CCNAME environment variable for the user session I added the custom spawner to jupyterhub_config.py and restarted jupyterhub. Invoking Livy from Jupyterhub no longer has the authentication issue now.

jbreitman commented 3 years ago

Are you able to share your custom spawner? The link that you provided is interesting, but did not work for me so I assume you used something else.