jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 447 forks source link

[BUG] unable to start spark session when trying to connect to kerberized EMR cluster #708

Closed dsaid-xj closed 3 years ago

dsaid-xj commented 3 years ago

Describe the bug Jupyter notebook seems to be unable to set up a connection with the Kerberized EMR cluster despite kinit and klist commands working perfectly fine.

Error produced:

    'NoneType' object has no attribute '__dict__'.

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

To Reproduce I have strictly followed this guide in ap-southeast-1 (https://aws.amazon.com/blogs/machine-learning/securing-data-analytics-with-an-amazon-sagemaker-notebook-instance-and-kerberized-amazon-emr-cluster/) and the error occurred when trying to run the jupyter notebook cells, after kinit and klist worked. I've checked the sparkmagic configuration, but am unable to determine what went wrong.

Expected behavior First code cell should be able to start a spark context for the jupyter notebook to run on the kerberized EMR cluster as per the blog guide.

Versions:

dsaid-xj commented 3 years ago

Found out the reason for error.

The Script that modifies the config.json file accidentally changes the None type authentication to Kerberos, which results in a duplicate in the authentication types. Have to change it back to None so that there would not be conflicts in the config.

side note, for kerberos to authenticate. All nodes of EMR needs to have the linux users created with the same name (usually done by bootstrap script)