Closed aleevangelista closed 5 years ago
I want to know how to setup sparkmagic with Kerberos as well. It would be great to see an example.
See required Livy config lines there https://github.com/apache/incubator-livy/blob/master/conf/livy.conf.template#L152-L157
livy.server.auth.type = kerberos
livy.server.auth.kerberos.principal = HTTP/_HOST@YOUR.COMPANY.COM
livy.server.auth.kerberos.keytab = /some/path/http.keytab1
livy.server.auth.kerberos.name-rules = DEFAULT
you would also need something like
livy.server.launch.kerberos.keytab = /some/path/svc_livy.keytab2
livy.server.launch.kerberos.principal = svc_livy@YOUR.COMPANY.COM
2.
You would also need to update Hadoop side configs to trust Livy's service account (let's say it is svc_livy
) to impersonate to other users
E.g.
hadoop.proxyuser.svc_livy.groups
hadoop.proxyuser.svc_livy.hosts
hadoop.proxyuser.http.groups
hadoop.proxyuser.http.hosts
Livy configuration very often outdated and/or missing on many items
You might need to look at Livy source code to get some answers.
For example, livy.server.launch.kerberos...
can be only seen in source code
Dev/ user email lists are helpful too https://lists.apache.org/list.html?user@livy.apache.org https://lists.apache.org/list.html?dev@livy.apache.org
After a deep analysis I found the reason for error 401. When a user logs on to the target environment using ssh, a credential cache file is created with a specific name structure. Using Kerberos a credential cache is usually created using the following template:
in the target environment the credential cache file name template is:
When a user logs into Jupyterhub and his session starts, no KRB5CCNAME environment variable is defined. In this way, when the user opens a pyspark notebook and execute a command, for example "%%info", sparkmagic tries to authenticate on Livy by accessing the default credential cache file:
This file does not exist. Authentication is not possible. A 401 error is returned. I looked for a solution to read the correct file. I found a possible suggestion by reading https://github.com/jupyter-incubator/sparkmagic/issues/466. So I wrote a custom spawner, extending LocalProcessSpawner, which could retrieve the correct file name to use for user's credential cache and could set the KRB5CCNAME environment variable for the user session I added the custom spawner to jupyterhub_config.py and restarted jupyterhub. Invoking Livy from Jupyterhub no longer has the authentication issue now.
Are you able to share your custom spawner? The link that you provided is interesting, but did not work for me so I assume you used something else.
Is there any example to configure Sparkmagic to use kerberos authentication with Livy? Setting authentication type "Kerberos" in config.json doesn't work. I get a "401" error. But If I open a terminal in the user session in JupyterHub and execute klist command i can see the tgt ticket in ccache. I'm using the default spawner. Should I use another specific custom spawner?