jtriley / StarCluster

StarCluster is an open source cluster-computing toolkit for Amazon's Elastic Compute Cloud (EC2).
http://star.mit.edu/cluster
GNU Lesser General Public License v3.0
582 stars 308 forks source link

Unable to launch cluster: SGE master install failures #541

Open wyim-pgl opened 9 years ago

wyim-pgl commented 9 years ago

Hello,

I used your x64 public AMI, and during the installation it is stucking in "Installing Sun Grid Engine..."

Here is my log

starcluster start physicscluster StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6) Software Tools for Academics and Researchers (STAR) Please submit bug reports to starcluster@mit.edu

Using default cluster template: smallcluster Validating cluster template settings... Cluster template settings are valid Starting cluster... Launching a 2-node cluster... Creating security group @sc-physicscluster... Opening tcp port range 21-21 for CIDR 0.0.0.0/0 Opening tcp port range 80-80 for CIDR 0.0.0.0/0 Opening tcp port range 8000-9000 for CIDR 0.0.0.0/0 Reservation:r-76cb28b4 Waiting for instances to propagate... 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Waiting for cluster to come up... (updating every 30s) Waiting for all nodes to be in a 'running' state... 0/2 | 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Waiting for SSH to come up on all nodes... |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 1/2 |///////////////////////////////// 1/2 |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 1/2 |///////////////////////////////// 1/2 |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 1/2 |///////////////////////////////// 1/2 |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 1/2 |///////////////////////////////// 1/2 |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 1/2 |///////////////////////////////// 1/2 |--------------------------------- 1/2 |\\\\\\\\\ 1/2 |||||||||||||||||||||||||||||||||| 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Waiting for cluster to come up took 1.541 mins The master node is ec2-52-8-147-41.us-west-1.compute.amazonaws.com Configuring cluster... Attaching volume vol-bc63cd45 to master node on /dev/sdz ... Waiting for vol-bc63cd45 to transition to: attached... Running plugin starcluster.clustersetup.DefaultClusterSetup Configuring hostnames... 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Mounting EBS volume vol-bc63cd45 on /home/ubuntu/scratch... Creating cluster user: sgeadmin (uid: 1001, gid: 1001) 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Configuring scratch space for user(s): sgeadmin 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Configuring /etc/hosts on each node 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Starting NFS server on master Configuring NFS exports path(s): /home /home/ubuntu/scratch Mounting all NFS export path(s) on 1 worker node(s) 0/1 | 1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Setting up NFS took 0.035 mins Configuring passwordless ssh for root Configuring passwordless ssh for sgeadmin Running plugin starcluster.plugins.sge.SGEPlugin Configuring SGE... Configuring NFS exports path(s): /opt/sge6 Mounting all NFS export path(s) on 1 worker node(s) 0/1 | 1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Setting up NFS took 0.019 mins Installing Sun Grid Engine...| 1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Creating SGE parallel environment 'orte' 0/2 | 2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Adding parallel environment 'orte' to queue 'all.q' Running plugin sge Configuring SGE... Configuring NFS exports path(s): /opt/sge6 Mounting all NFS export path(s) on 1 worker node(s) 0/1 | 1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% Setting up NFS took 0.018 mins Removing previous SGE installation... Installing Sun Grid Engine...

!!! ERROR - Error occured while running plugin 'sge': !!! ERROR - remote command 'source /etc/profile && cd /opt/sge6 && !!! ERROR - TERM=rxvt ./inst_sge_sc -m -x -noremote -auto !!! ERROR - ./ec2_sge.conf' failed with status 1: !!! ERROR - Reading configuration from file ./ec2_sge.conf !!! ERROR - [H[2JInstall log can be found in: /opt/sge6/default/common/i !!! ERROR - nstall_logs/qmaster_install_master_2015-07-07_07:19:24.log

======= /opt/sge6/default/common/install_logs/qmaster_install_master_2015-07-07_07:19:24.log Using >20000-20100< as gid range. Using >/opt/sge6/default/spool< as EXECD_SPOOL_DIR. Using >none@none.edu< as ADMIN_MAIL. Adding default parallel environments (PE)

starting sge_qmaster Reached 5min timeout, while waiting for qmaster PID file. sge_qmaster daemon didn't start. Please check your autoinstall configuration file! Installation failed!

is there any suggestion to install it ?

Thank you!

Won

cancan101 commented 9 years ago

See: https://github.com/jtriley/StarCluster/issues/481