Open karlam123 opened 4 years ago
Hmmm, this is a new one. I suspect your spark-submit
configuration has some extra things specified that we're not picking up (spark and hadoop CLIs allow specifying configuration options in environment variables that aren't picked up by the corresponding java libraries, which is annoying). A few things that would help debug:
yarn application -list
*_OPTS
environment variables set (like HADOOP_OPTS
)? The java libraries that skein is built on only load configuration from files, not environment variables.spark.hadoop.*
variables set? These override values found in the standard *-site.xml
files, and may explain this discrepancy. See https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration.There may be something else going on besides the above, but this is what I'd check first.
Thanks for the help!
/usr/hdp/current/spark2-client/conf/spark-defaults.conf
:
spark.hadoop.hive.llap.daemon.service.hosts
spark.hadoop.hive.zookeeper.quorum
spark.yarn.access.hadoopFileSystems hdfs://ProdHadoop:8020
spark.yarn.am.extraJavaOptions -Dhdp.version=3.1.4.0-315
spark.yarn.historyServer.address
Hmmm, ok. A few other places there might be things:
HADOOP_YARN_HOME
set? I not, if you set it appropriately do things work out (should be something like /usr/lib/hadoop-yarn
).JAVA_HOME
set? Does it point to the correct java for yarn to use? If not, is which java
the correct java?yarn classpath
include somewhere in there the directory with your hadoop configuration files (yarn-site.xml
, core-site.xml
, ...)?yarn-env.sh
and/or hadoop-env.sh
somewhere (I'd use find
to find this, it could be in a few places depending on your system). I'd look for anything that looks like it's setting a configuration/home dir (e.g. HADOOP_CONF_DIR
/HADOOP_YARN_HOME
, ...), java options (-D...
), or kerberos related things (environment variables with KRB5
or kerberos in them).The following links might also be relevant, particularly the solution in the stackoverflow one:
It's not clear to me why yarn/spark would work when our code doesn't - we should be taking the same login path as those tools.
Thanks and sorry for the late reply!
HADOOP_YARN_HOME
was not set, but tried to set it the same was as in /usr/hdp/3.1.4.0-315/hadoop/conf/yarn-env.sh
to /usr/hdp/3.1.4.0-315/hadoop-yarn
with the same result.JAVA_HOME
was not set, changed it to $(dirname $(dirname $(readlink -f $(which javac))))
, which is the same as in yarn-env.sh
, still the same error.yarn classpath
have the directory with hadoop configuration files (/usr/hdp/3.1.4.0-315/hadoop/conf
)yarn-env.sh
export YARN_RESOURCEMANAGER_OPTS="-Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_jaas.conf
export YARN_TIMELINESERVER_OPTS="-Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_ats_jaas.conf"
export YARN_TIMELINEREADER_OPTS="-Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_ats_jaas.conf"
export YARN_REGISTRYDNS_OPTS="-Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_registry_dns_jaas.conf"
export YARN_NODEMANAGER_OPTS="-Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_nm_jaas.conf -Dsun.security.krb5.rcache=none"
HADOOP_OPTS="$HADOOP_OPTS -Djavax.security.auth.useSubjectCredsOnly=false"
YARN_RESOURCEMANAGER_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/3.1.4.0-315/0/yarn_jaas.conf -Dzookeeper.sasl.clientconfig=Client $YARN_RESOURCEMANAGER_OPTS"
Thank you for the links, I'll look into them.
Try setting SKEIN_DRIVER_JAVA_OPTIONS
as follows:
export SKEIN_DRIVER_JAVA_OPTIONS="$HADOOP_OPTS -Djavax.security.auth.useSubjectCredsOnly=false"
The javax.security.auth.useSubjectCredsOnly
property deals with kerberos authentication and may lead to the failure we're seeing here.
Still the same error unfortunately.
In my example above did you already have a global running skein driver (had you run skein driver start
previously?). The java options are only loaded on driver startup. To be clear, the test above would have been:
$ skein driver stop # ensure there's no global driver
$ export SKEIN_DRIVER_JAVA_OPTIONS="$HADOOP_OPTS -Djavax.security.auth.useSubjectCredsOnly=false"
$ skein application list
If that's what you did and it didn't work, I'm at a loss here. It's likely a difference between the Java libraries we use (which read from configuration files) and the CLI tools you've successfully gotten working (which pick up additional options from environment variables, shell files, etc...). I'm not sure what else to check. If you happen to have a way to reproduce this in a failing environment (e.g. a docker image) this would make it easier for me to debug locally, but as is I'm not sure how else to help (sorry).
First of all, thanks for the help!
Yes, that is what I did. Currently, I don't have a way to reproduce this in a docker image, so I think I will try to familiarize myself with your java code and see if I can't figure out why the host is not picked up for our particular environment.
Small update:
I saw that when I in Driver.java
replaced
String tokenRenewer = conf.get(YarnConfiguration.RM_PRINCIPAL);
with a hardcoded value found in core-site.xml
I could submit applications.
Then I showed this to a colleague who have a lot of experience with kerberos and he said that, with some exceptions, as long as the user is valid one can use that one instead.
So I put
String tokenRenewer = ugi.getUserName();
and that also worked.
Interesting. What property did you take the hardcoded value from? Do you remember what the exceptions were?
I'm a bit hesitant to change our code here, our current implementation matches that recommended by the YARN docs and also that in other projects. I'd prefer (if possible) to find a way that works for you to get the RM principal as a renewer. That said, there have been plenty of times where the yarn docs have been flat out wrong, and we have many hacks around yarn bugs, so if your solution works we could use that too.
Hi!
I think I have found a way to get the RM principal as a renewer. The underlying issue seems to be that we have a HA setup for the RM and conf.get(YarnConfiguration.RM_PRINCIPAL);
doesn't do the _HOST
replacement. In Hadoop 3, they have something that can do it: http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-client/apidocs/org/apache/hadoop/yarn/client/util/YarnClientUtils.html
so I copy pasted the code for public static String getRmPrincipal(String rmPrincipal, Configuration conf)
found here: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/util/YarnClientUtils.java
then it also worked. Seems to be how they solved it here as well: https://github.com/linkedin/TonY/blob/master/tony-core/src/main/java/com/linkedin/tony/TonyClient.java.
Hi!
I'm trying to run skein application submit hello_world.yaml found here https://jcrist.github.io/skein/quickstart.html.
I'm on HDP 3.1.4.0-315, python3.6 and skein 0.8.0. The following environment variables are set: HADOOP_HOME, HADOOP_CONF_DIR and HADOOP_HDFS_HOME.
Logs:
It seems to me that _HOST in rm@_HOST@COMPANY_HOST is not found. I can submit jobs using spark with spark-submit, so I think the configuration on the cluster is OK.
Does anybody have an idea on what the problem could be or where I should look?