thunder-project / thunder

scalable analysis of images and time series
http://thunder-project.org
Apache License 2.0
814 stars 184 forks source link

Anaconda install pssh error when using thunder-ec2 #231

Open sophie63 opened 9 years ago

sophie63 commented 9 years ago

I am blocked on accessing EC2 with thunder-ec2. I have been consistently getting a pssh error at the Anaconda installation step (see below). I have restarted using --resume many times, redid the AWS credentials and key setup steps and also freshly pip installed thunder-python --upgrade but still get the error. ''' [Generating cluster's SSH key on master] [success] [Transferring cluster's SSH key to slaves] [success] [Deploying files to master] [success] [Installing Spark (may take several minutes)] [success] [Downloading Anaconda] [success] [Installing Anaconda] [SSH failure, returning error] Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/thunder/utils/ec2.py", line 514, in install_anaconda(master, opts) File "/usr/local/lib/python2.7/dist-packages/thunder/utils/ec2.py", line 172, in install_anaconda ssh(master, opts, "pssh -h /root/spark-ec2/slaves 'echo 'export " File "/usr/local/lib/python2.7/dist-packages/thunder/utils/ec2.py", line 295, in ssh raise Exception(stdout) Exception: bash: pssh: command not found '''

sophie63 commented 9 years ago

I am using spark version 1.1.0.

npyoung commented 9 years ago

It looks like you're missing the pssh utility on the cluster master. Try to reproduce this error (just to get your cluster alive), then thunder-ec2 login <your-cluster> to log into the master. Once there, do yum install -y pssh. If that works, log out and try to start the cluster again with --resume.

The reason for this is a bug in thunder/utils/ect2.py. install_anaconda gets called before install_thunder, but yum install -y pssh only happens in install_thunder. This line should get moved up into install_anaconda. Sound about right, @freeman-lab ?

freeman-lab commented 9 years ago

I wonder if it's an issue of an older AMI getting used that doesn't have pssh, though I'm fairly certain we've been using pssh in the launch process since before 1.1.0.

sophie63 commented 9 years ago

I installed spark 1.4.1 instead of 1.1.0 and I don't have this problem anymore. So maybe it would be helpful to update the documentation to say that 1.4 or higher is recommended for thunder-ec2. Thanks to you both!

sophie63 commented 9 years ago

Just FYI, I reproduce the problem with spark 1.2.1 but not 1.3.0. So you could recommend spark 1.3 or higher.