stackhpc / ansible-role-openhpc

Ansible role for OpenHPC
Apache License 2.0
44 stars 15 forks source link

CI fails on master #157

Closed sjpb closed 10 months ago

sjpb commented 1 year ago

fails on idempotency step, (before verify), which appears to be because the slurmds cannot resolve the name of the slurmctld:

May 12 09:24:56 testohpc-compute-0 slurmd[31519]: slurmd: error: get_addr_info: getaddrinfo() failed: Name or service not known
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: slurmd: error: slurm_set_addr: Unable to resolve "testohpc-login-0"
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: slurmd: error: Unable to establish control machine address
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: slurmd: error: _fetch_child: failed to fetch remote configs
May 12 09:24:56 testohpc-compute-0 systemd[1]: slurmd.service: Main process exited, code=exited, status=1/FAILURE
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: error: get_addr_info: getaddrinfo() failed: Name or service not known
May 12 09:24:56 testohpc-compute-0 systemd[1]: slurmd.service: Failed with result 'exit-code'.
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: error: slurm_set_addr: Unable to resolve "testohpc-login-0"
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: error: Unable to establish control machine address
May 12 09:24:56 testohpc-compute-0 slurmd[31519]: error: _fetch_child: failed to fetch remote configs
sjpb commented 10 months ago

Fixed by https://github.com/stackhpc/ansible-role-openhpc/pull/161