jtriley / StarCluster

StarCluster is an open source cluster-computing toolkit for Amazon's Elastic Compute Cloud (EC2).
http://star.mit.edu/cluster
GNU Lesser General Public License v3.0
582 stars 308 forks source link

ssh as root is incorrect for AMI and can't mount volumn in ubuntu #336

Open xiaol opened 10 years ago

xiaol commented 10 years ago

Amazon Linux AMI 2013.09.1 - ami-b1fe9bb0 (64-bit) has the user that's ec2-user not root. As a result, running the default cluster setup ,will waiting for the ssh done forever.

then i changed to ubuntu instance , ssh is all right, but got this

2013-11-25 17:21:44,672 cli.py:301 - DEBUG - remote command 'source /etc/profile && mount /scratch' failed with status 32:
mount: you must specify the filesystem type
Traceback (most recent call last):
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cli.py", line 274, in main
    sc.execute(args)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/commands/restart.py", line 48, in execute
    self.cm.restart_cluster(arg, reboot_only=self.opts.reboot_only)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 203, in restart_cluster
    cl.restart_cluster(reboot_only=reboot_only)
  File "<string>", line 2, in restart_cluster
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/utils.py", line 111, in wrap_f
    res = func(*arg, **kargs)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 1379, in restart_cluster
    self.setup_cluster()
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 1517, in setup_cluster
    self._setup_cluster()
  File "<string>", line 2, in _setup_cluster
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/utils.py", line 111, in wrap_f
    res = func(*arg, **kargs)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 1529, in _setup_cluster
    self.run_plugins()
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 1547, in run_plugins
    self.run_plugin(plug, method_name=method_name, node=node)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/cluster.py", line 1572, in run_plugin
    func(*args)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/clustersetup.py", line 378, in run
    self._setup_ebs_volumes()
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/clustersetup.py", line 330, in _setup_ebs_volumes
    master.mount_device(volume_partition, mount_path)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/node.py", line 788, in mount_device
    self.ssh.execute('mount %s' % path)
  File "/Users/liuivan/Workspace/computingPlatform/awsEnv/lib/python2.7/site-packages/starcluster/sshutils/__init__.py", line 555, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /scratch' failed with status 32:
mount: you must specify the filesystem type

So anybody know how to resolve this?

Should i chose a different instance, or apply a patch?

jtriley commented 10 years ago

@xiaol Currently StarCluster requires root privileges on the AMI in order to properly configure things which is why you had issues with the Amazon AMI - they require you to login as ec2-user and then use sudo everywhere you need root privileges. Until StarCluster has proper sudo support or the Amazon AMIs allow enabling root login via userdata some how I'm afraid StarCluster is incompatible with the Amazon AMIs. I'm guessing you were able to get around this using the official StarCluster images but just to make sure: which AMI are you using?

Looking at your traceback it seems that you're attaching an EBS volume to /scratch with a config similar to:

[vol myvol]
volume_id=vol-999999
mount_path=/scratch

[cluster mycluster]
...
volumes = myvol

Is that correct? If so how did you go about creating the EBS volume? My guess is you forgot to format the volume before using it. If so then in the future you can have StarCluster automatically format the volume for you using the createvolume command:

http://star.mit.edu/cluster/docs/latest/manual/volumes.html#create-and-format-a-new-ebs-volume

sirotenko commented 10 years ago

@jtriley There's one problem with g2 GPU instance. The problem is that it doesn't work with any AMIs other that Amazon. I've tried all Starcluster AMIs and confirmed, that all of them have same error, that current AMI can not be used with g2 instance. So it means, that one can't use Starcluster AMIs because they're not compatible with g2 and can't use Amazon AMIs since they don't allow for root user.