Closed nicholas-leonard closed 10 years ago
Port Forwarding
Some cluster don't allow compute node to connect to the outside. So to use our postgres db, we must use some sort of port forwarding. You don't need this when you use only the head node. You will need to make change to your .pgpass file to have the password used automatically. Here is the step you must follow to make/use the portforwarding:
On Monk/Angel/Saw cluster:
) Use proxy.sharcnet.ca:5432 as the forward node. The sharcnet admin created it as it is more stable then the head node. For instance, * psql -h proxy.sharcnet.ca -p 5432 -d _db. With jobman, jobman sql postgres://@proxy.sharcnet.ca:5432// ..
) If for some reason 1) don't work, check the manual setup.
On Colosse:
) Use 10.225.3.12:54321 as the forward node. The colosse admin created it as it is more stable then the head node. For instance, * psql -h 10.225.3.12 -p 54321 -d _db. With jobman, jobman sql postgres://@10.225.3.12:54321// ..
) If for some reason 1) don't work, check the manual setup.
Briaree/Hades cluster:
) Compute nodes don't have access to the internet
) But there is an exception for opter, so jobman work correctly.
Manual setup:
) On the head node, check if there is not already a port forwarding set: ps aux|grep opter|grep ssh
) If there isn't, create one: ssh -v -f -o ServerAliveInterval=240 -N -L 5432:localhost:5432 opter.iro.umontreal.ca
1) ??? Why do you need to enter your password???
) On the compute node, replace opter with the name of the node the port forward was created. For instance, * psql -h monk -p 5432 -d _db -U . With jobman, jobman sql postgres://@monk:5432// ..
You must modify your ~/.pgpass to use the host that do the port forwarding in the command for the connection: 10.225.3.12, monk, ang23, saw-login1, ...
I am looking for something like:
Calling it should either generate -n=10 sample hyperconfigurations. The switch --queue=23 indicates that the samples should be queued to queue 23, in the database. However, the script should also be runable locally via --local, such that it can be used with jobman to have each job sample its set of --n=10 hyper-configurations to train on.