Issue: YARN is running, but when a REEF job is submitted the following error appears:
2014-07-14 11:17:40,940 FINE hadoop.util.Shell.checkHadoopHome main | Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225)
[... stack trace omitted ...]
Solution: This issue appears because REEF needs to read the environment variable $HADOOP_HOME within an SSH session. You can run the following to check if it is set:
$ ssh localhost 'echo $HADOOP_HOME'
(Notice the single quotes. Don’t use double quotes, as this will expand $HADOOP_HOME in the local session, which is not what we want.) If running the command does not print your Hadoop home directory then you’ll need to do some extra configuration for SSH sessions.
Each OS distribution handles this in its own way: Mac OS X will read from ~/.bashrc, while Ubuntu will read from ~/.ssh/environment. Make the necessary changes for your OS (we cover Ubuntu below) and make sure the above command works. Then, make sure to restart YARN. This is necessary because REEF reads the Hadoop Home from YARN, which caches reads the environment variable once, at startup.
Extended Solution: Here we cover the setup for Ubuntu, which caused the most headaches for students.
For Ubuntu, the file /etc/ssh/sshd_config states various configurations for sshd.
Ubuntu users need to add the line
PermitUserEnvironment yes
to the file (PermitUserEnvironment is not set by default), to let user-defined environment variables be set correctly. The file is a read-only file, so you may need to add sudo in front of your text editor command to earn the permission to modify the file. E.g.
$ sudo vi /etc/ssh/sshd_config
It doesn't matter which position you add the line; you can just add it to the end of the file.
Aside from .profile or .bashrc, Ubuntu users also need to set their environment variables in~/.ssh/environment. In fact, the file is not present by default; you may need to create the file manually and then add $YARN_HOME, $REEF_HOME, etc.
Example of ~/.ssh/environment:
Verify that the setup works by running the ssh command. It's possible that the change will not take effect until the ssh service is restarted. If the environment variable is still not showing up, run:
Issue: YARN is running, but when a REEF job is submitted the following error appears:
Solution: This issue appears because REEF needs to read the environment variable $HADOOP_HOME within an SSH session. You can run the following to check if it is set:
(Notice the single quotes. Don’t use double quotes, as this will expand $HADOOP_HOME in the local session, which is not what we want.) If running the command does not print your Hadoop home directory then you’ll need to do some extra configuration for SSH sessions.
Each OS distribution handles this in its own way: Mac OS X will read from ~/.bashrc, while Ubuntu will read from ~/.ssh/environment. Make the necessary changes for your OS (we cover Ubuntu below) and make sure the above command works. Then, make sure to restart YARN. This is necessary because REEF reads the Hadoop Home from YARN, which caches reads the environment variable once, at startup.
Extended Solution: Here we cover the setup for Ubuntu, which caused the most headaches for students.
Ubuntu users need to add the line
to the file (PermitUserEnvironment is not set by default), to let user-defined environment variables be set correctly. The file is a read-only file, so you may need to add sudo in front of your text editor command to earn the permission to modify the file. E.g.
It doesn't matter which position you add the line; you can just add it to the end of the file.
Finally, don’t forget to restart YARN!