nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
638 stars 116 forks source link

Automatically add Spark and HDFS executables to `$PATH` #122

Closed BenFradet closed 8 years ago

BenFradet commented 8 years ago

This PR makes the following changes:

I launched a cluster through flintrock and checked that all was on the path.

Fixes #119.

nchammas commented 8 years ago

Haven't tested this out yet, but it looks good to me.

Once concern I had about this @BenFradet is that Hadoop and Spark may have identically named executables for things like start-master.sh and so on, though I don't remember for sure. (I think those may be in sbin not bin, actually, so perhaps this is not a problem.)

Are we accidentally linking an ambiguous executable to /usr/local/bin?

BenFradet commented 8 years ago

yes those are in sbin since they are services scripts.

The only ones in bin are:

So there shouldn't be any conflicts.

nchammas commented 8 years ago

Great. I'll take a closer look at this later tonight and merge in if all looks good.

nchammas commented 8 years ago

Looks like there's an issue launching the PySpark shell from the home directory:

$ pyspark 
/usr/local/bin/pyspark: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
/usr/local/bin/spark-class: line 24: /usr/local/bin/load-spark-env.sh: No such file or directory
ls: cannot access /usr/local/assembly/target/scala-: No such file or directory
Failed to find Spark assembly in /usr/local/assembly/target/scala-.
You need to build Spark before running this program.
$ ./spark/bin/pyspark 
Python 2.7.10 (default, Dec  8 2015, 18:25:23) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.10 (default, Dec  8 2015 18:25:23)
SparkContext available as sc, HiveContext available as sqlContext.
>>> 

Same goes for spark-shell.

It looks like they are doing relative calls to load-spark-env.sh and not finding it in /usr/local/bin/.

nchammas commented 8 years ago

Perhaps we need to set SPARK_HOME?

BenFradet commented 8 years ago

I forgot that we had our own versions of those scripts which work with symlinks, I'll investigate.

nchammas commented 8 years ago

FYI @BenFradet I am having trouble getting hdfs dfs to work from the user's home directory. Calling hdfs alone works but also throws an error.

$ hdfs dfs
/usr/local/bin/hdfs: line 35: /usr/local/bin/../libexec/hdfs-config.sh: No such file or directory
/usr/local/bin/hdfs: line 304: exec: : not found
$ hdfs
/usr/local/bin/hdfs: line 35: /usr/local/bin/../libexec/hdfs-config.sh: No such file or directory
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
       where COMMAND is one of:
...

Did this used to work or did we just miss this issue? Setting HADOOP_HOME doesn't seem to help btw.

BenFradet commented 8 years ago

Seems like an oversight on my part, sorry.

From looking at the hdfs script, we might want to add an HADOOP_LIBEXEC_DIR env var. I'll try that out tomorrow.

nchammas commented 8 years ago

OK, no worries.

BenFradet commented 8 years ago

This definitely not slipped my mind but I haven't had much time to dedicate to this. However, I should be able to fix this over the weekend.

nchammas commented 8 years ago

No worries at all @BenFradet. Thanks for following up.