Closed BenFradet closed 8 years ago
Haven't tested this out yet, but it looks good to me.
Once concern I had about this @BenFradet is that Hadoop and Spark may have identically named executables for things like start-master.sh
and so on, though I don't remember for sure. (I think those may be in sbin
not bin
, actually, so perhaps this is not a problem.)
Are we accidentally linking an ambiguous executable to /usr/local/bin
?
yes those are in sbin since they are services scripts.
The only ones in bin
are:
So there shouldn't be any conflicts.
Great. I'll take a closer look at this later tonight and merge in if all looks good.
Looks like there's an issue launching the PySpark shell from the home directory:
$ pyspark
/usr/local/bin/pyspark: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
/usr/local/bin/spark-class: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
ls: cannot access /usr/local/assembly/target/scala-: No such file or directory
Failed to find Spark assembly in /usr/local/assembly/target/scala-.
You need to build Spark before running this program.
$ ./spark/bin/pyspark
Python 2.7.10 (default, Dec 8 2015, 18:25:23)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.10 (default, Dec 8 2015 18:25:23)
SparkContext available as sc, HiveContext available as sqlContext.
>>>
Same goes for spark-shell
.
It looks like they are doing relative calls to load-spark-env.sh
and not finding it in /usr/local/bin/
.
Perhaps we need to set SPARK_HOME
?
I forgot that we had our own versions of those scripts which work with symlinks, I'll investigate.
FYI @BenFradet I am having trouble getting hdfs dfs
to work from the user's home directory. Calling hdfs
alone works but also throws an error.
$ hdfs dfs
/usr/local/bin/hdfs: line 35: /usr/local/bin/../libexec/hdfs-config.sh: No such file or directory
/usr/local/bin/hdfs: line 304: exec: : not found
$ hdfs
/usr/local/bin/hdfs: line 35: /usr/local/bin/../libexec/hdfs-config.sh: No such file or directory
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
where COMMAND is one of:
...
Did this used to work or did we just miss this issue? Setting HADOOP_HOME
doesn't seem to help btw.
Seems like an oversight on my part, sorry.
From looking at the hdfs script, we might want to add an HADOOP_LIBEXEC_DIR
env var. I'll try that out tomorrow.
OK, no worries.
This definitely not slipped my mind but I haven't had much time to dedicate to this. However, I should be able to fix this over the weekend.
No worries at all @BenFradet. Thanks for following up.
This PR makes the following changes:
I launched a cluster through flintrock and checked that all was on the path.
Fixes #119.