hashicorp / nomad-spark

DEPRECATED: Apache Spark with native support for Nomad as a scheduler
44 stars 16 forks source link

Kerberos support in nomad mode #19

Open jorisdevrede opened 5 years ago

jorisdevrede commented 5 years ago

Does the Nomad mode support Kerberos?

When I try to run a Spark job that reads from a kerberized HDFS cluster, I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o31.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.149.251.90, executor 2fa1c572-4942-56ad-30a3-1b53dc30194a-1545137273210): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "/"; destination host is: "":8020;

On https://spark.apache.org/docs/latest/security.html#kerberos I read that "Delegation token support is currently only supported in YARN and Mesos modes." Does this mean that Kerberos is not supported in Nomad mode?

cgbaker commented 5 years ago

I'm not sure about this, @jorisdevrede , I'll look into it.

jorisdevrede commented 5 years ago

thank you, @cgbaker

It would really help our transition to Nomad, if we can connect to our current HDFS implementation.

amitthk commented 5 years ago

Kerberos is essential for our environments as well. To prevent against malware like Mirai, XBash, and DemonBot known to affect non Kerberized Hadoop Clusters connected to Internet.

cgbaker commented 5 years ago

Thanks for the feedback, @amitthk . Support for this is something that has come up in discussions over the past few days, so there is building support to implement it. I will update this ticket as work progresses.