Open Gaopeng-Bai opened 1 month ago
hi, can you give a try following:
connect to the accounting node (slurmdbd docker instance using docker-compose or docker exec) then try ping slurmmaster? hopefully this will tell us if master is reachable or not. or how the name resolution was done.
once your at slurmdbd shell -- you can run the cluster registration command directly
sacctmgr --immediate add cluster name=clusterlab
If still doesn't work, please share more details on your distributed test env. thanks
I am conducting tests on WSL, modifying the slurm.conf and gres.conf configuration files, and using only one node with a GPU. On the WSL system, I modified the /etc/hosts file with the format from the host file in the repository. Then I ran steps 1 to 4. Finally, when running ./register_cluster.sh, I encountered the error:
"no configuration file provided: not found."
I checked the slurmmaster logs and found errors there as well. shows:
`sudo: unable to resolve host slurmmaster: Name or service not known sudo: unable to resolve host slurmmaster: Temporary failure in name resolution
Can you help me with how to successfully run this test?