stevenshiau / drbl

Diskless Remote Boot in Linux.
GNU General Public License v2.0
64 stars 24 forks source link

Cannot SSH into DRBL server from a client machine #28

Open 4bluegen2s opened 1 year ago

4bluegen2s commented 1 year ago

I have an MPI cluster I do some work on at home.

My DRBL server is hosted as VM on my NAS, and is more powerful than my compute nodes. Ideally, I want it to be a part of the cluster after it has served the clients their OS and file system.

I found that when I run mpiexec -np 12 -hostfile /mirror/cluster_machines python /mirror/approx_pi.py I get the following: [proxy:0:0@cn1] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "cn1" to "drbl" (connection refused) [proxy:0:0@cn1] main (pm/pmiserv/pmip.c:183): unable to connect to server drbl at port 42597 (check for firewalls!) [proxy:0:1@cn2] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "cn2" to "drbl" (connection refused) [proxy:0:1@cn2] main (pm/pmiserv/pmip.c:183): unable to connect to server drbl at port 42597 (check for firewalls!)

But when I run it on cn1 alone, using just cn1 and cn2 as workers, they talk to each other and run fine.

I also noticed I can ssh into cn1. I get the prompt: node@cn1:~$ and from there I can ssh into cn2, and get the correct prompt: node@cn2:~$ but if I try to ssh into drbl from a client, it seems to log in fine, but the prompt stays the same as the client I tried to log in from. Additionally, when I type "exit" and get out of the ssh session, it says: logout Connection to drbl closed. although it doesn't seem to have ever really logged in..

Can someone help me with the firewall settings so I can add my DRBL server into my cluster?

stevenshiau commented 1 year ago

You did not mention how you configure your DRBL server. Since it can be full DRBL mode, DRBL-SSI mode, etc... In addition, so you issue is when you login from cn1/cn2 to your DRBL server? Or? I am actually confused.

Steven

4bluegen2s commented 1 year ago

It is set up in Full DRBL mode. You are correct, the issue is when I try to login from cn1/cn2 to the DRBL server.

stevenshiau commented 1 year ago

I can not reproduce this issue on my Debian Bullseye server. I can from login its DRBL client via console, and ssh login into the DRBL server. Make sure you have set your client in "remote-linux-gra" or "remote-linux-txt" when running "sudo dcs". BTW, which GNU/Linux did you configure for your DRBL server? It would be better if you can run "drbl-bug-report" and share the generated file. In addition, if you can, please give unstable DRBL a try, i.e., drbl 5.2.9.

Steven