Open 4bluegen2s opened 1 year ago
You did not mention how you configure your DRBL server. Since it can be full DRBL mode, DRBL-SSI mode, etc... In addition, so you issue is when you login from cn1/cn2 to your DRBL server? Or? I am actually confused.
Steven
It is set up in Full DRBL mode. You are correct, the issue is when I try to login from cn1/cn2 to the DRBL server.
I can not reproduce this issue on my Debian Bullseye server. I can from login its DRBL client via console, and ssh login into the DRBL server. Make sure you have set your client in "remote-linux-gra" or "remote-linux-txt" when running "sudo dcs". BTW, which GNU/Linux did you configure for your DRBL server? It would be better if you can run "drbl-bug-report" and share the generated file. In addition, if you can, please give unstable DRBL a try, i.e., drbl 5.2.9.
Steven
I have an MPI cluster I do some work on at home.
My DRBL server is hosted as VM on my NAS, and is more powerful than my compute nodes. Ideally, I want it to be a part of the cluster after it has served the clients their OS and file system.
I found that when I run
mpiexec -np 12 -hostfile /mirror/cluster_machines python /mirror/approx_pi.py
I get the following:[proxy:0:0@cn1] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "cn1" to "drbl" (connection refused)
[proxy:0:0@cn1] main (pm/pmiserv/pmip.c:183): unable to connect to server drbl at port 42597 (check for firewalls!)
[proxy:0:1@cn2] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "cn2" to "drbl" (connection refused)
[proxy:0:1@cn2] main (pm/pmiserv/pmip.c:183): unable to connect to server drbl at port 42597 (check for firewalls!)
But when I run it on cn1 alone, using just cn1 and cn2 as workers, they talk to each other and run fine.
I also noticed I can ssh into cn1. I get the prompt:
node@cn1:~$
and from there I can ssh into cn2, and get the correct prompt:node@cn2:~$
but if I try to ssh into drbl from a client, it seems to log in fine, but the prompt stays the same as the client I tried to log in from. Additionally, when I type "exit" and get out of the ssh session, it says:logout Connection to drbl closed.
although it doesn't seem to have ever really logged in..Can someone help me with the firewall settings so I can add my DRBL server into my cluster?