oscar-cluster / oscar

OSCAR main source repository.
GNU General Public License v2.0
13 stars 2 forks source link

torque not configured on ubuntu #556

Closed dikim33 closed 7 years ago

dikim33 commented 7 years ago

Reported by naughtont on 6 Jul 2009 21:52 UTC The current Torque package is not automatically configured for Ubuntu based systems.

Ticket based on ubuntu pkg versions:

Testing environment:

dikim33 commented 7 years ago

Comment by naughtont on 22 Jul 2009 18:29 UTC Since the scripts are not working on Debian/Ubuntu, I'm piecing things together manually. Here are some notes on those pieces...

'PBS_MOM'

The following are commands I ran on all nodes running a MOM...

#
# Configurations related to the MOM
#
echo " Create  dir: /var/spool/torque/mom_priv/jobs"
mkdir -p /var/spool/torque/mom_priv/jobs
chmod 751 /var/spool/torque/mom_priv/jobs

echo " Create  dir: /var/spool/torque/aux"
mkdir -p /var/spool/torque/aux
chmod 0755 /var/spool/torque/aux

echo " Create  dir: /var/spool/torque/spool"
mkdir -p /var/spool/torque/spool 
chmod 1777 /var/spool/torque/spool 

echo " Create file: /var/spool/torque/pbs_environment"
touch /var/spool/torque/pbs_environment
chmod 0644 /var/spool/torque/pbs_environment
echo "PATH=/bin:/usr/bin" >> /var/spool/torque/pbs_environment
echo "LANG=en_US.UTF-8"   >> /var/spool/torque/pbs_environment
dikim33 commented 7 years ago

Comment by naughtont on 22 Jul 2009 22:13 UTC More manual edits...

When trying to run 'qstat' or 'qsub' as a standard user I was getting problems. This was apparently due to the 'pbs_iff' command not having setuid permissions.

Error message:

tjn@node0:$ qsub sleep-test.pbs 
pbs_iff: file not setuid root, likely misconfigured
pbs_iff: cannot connect to node0:15001 - fatal error, errno=13 (Permission denied)
No Permission.
qsub: cannot connect to server node0 (errno=15007)

Permissions...

tjn@node0:$ ls -l /usr/sbin/pbs_iff 
-rwxr-xr-x 1 root root 6604 2009-06-25 16:45 /usr/sbin/pbs_iff

tjn@node0:$ sudo chmod 4755 /usr/sbin/pbs_iff 

tjn@node0:$ ls -l /usr/sbin/pbs_iff 
-rwsr-xr-x 1 root root 6604 2009-06-25 16:45 /usr/sbin/pbs_iff
dikim33 commented 7 years ago

Comment by naughtont on 22 Jul 2009 22:16 UTC Additionally, for completeness here are the queue setup commands (taken from "torque.setup" & other documentation) that were run to create/setup/enable a default queue. Note, I just ran the individual commands manually, but I assume running with username as "root" would accomplish the same thing. Again, I'm not certain what best practice is for running Torque, but the current test config is owned/run as "root".

#!/bin/sh
# torque.setup

# USAGE:  torque.setup <USERNAME>

if [ "$1" = "" ] ; then
  echo "USAGE:  torque.setup <USERNAME>"
  exit 1
  fi

# create default queue
# enable operator privileges

USER=$1@`hostname`

echo "initializing TORQUE (admin: $USER)"

pbs_server -t create

qmgr -c "set server scheduling=true"

echo set server operators += $USER | qmgr
echo set server managers += $USER | qmgr

qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type = execution'
qmgr -c 'set queue batch started = true'
qmgr -c 'set queue batch enabled = true'
qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
qmgr -c 'set queue batch resources_default.nodes = 1'

qmgr -c 'set server default_queue = batch'
dikim33 commented 7 years ago

Comment by naughtont on 22 Jul 2009 22:17 UTC Lastly, the init.d script need to be included/added and should support startup/shutdown with a pidfile.

dikim33 commented 7 years ago

Comment by valleegr on 5 Aug 2009 21:04 UTC r8717 fixes most of the problems reported in ticket #556, the init.d scripts are still missing (then we will be able to close the ticket).