Building the containers from scratch shows these errors. But if you run the demo as is and then look at logs you'll see the same types of issues. Going into the frontend container and starting services can get make some errors go away this and restarting the slurm container can make some errors go away. But in general the demo looks to be pretty janky. Hopefully this is being maintained but if not please respond and let me know this demo is intended to get buy in from folks at my university to perhaps use this. So I dont want to put a bad foot forward
coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Starting gunicorn 20.1.0
coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Listening at: unix:/srv/www/coldfront.sock (39)
coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Using worker: sync
coldfront | [2024-08-13 19:44:37 +0000] [40] [INFO] Booting worker with pid: 40
coldfront | [2024-08-13 19:44:37 +0000] [41] [INFO] Booting worker with pid: 41
coldfront | [2024-08-13 19:44:37 +0000] [42] [INFO] Booting worker with pid: 42
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ldap | 66bbb7a9 conn=1003 fd=13 ACCEPT from IP=172.18.0.6:38824 (IP=0.0.0.0:636)
ondemand | -- Waiting for frontend ssh to become active ...
ldap | 66bbb7a9 conn=1003 fd=13 TLS established tls_ssf=256 ssf=256
ldap | 66bbb7a9 conn=1003 op=0 SRCH base="" scope=0 deref=0 filter="(objectClass=)"
ldap | 66bbb7a9 conn=1003 op=0 SRCH attr= altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domainControllerFunctionality defaultNamingContext lastUSN highestCommittedUSN
ldap | 66bbb7a9 conn=1003 op=0 SEARCH RESULT tag=101 err=0 nentries=1 text=
ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" method=128
ldap | 66bbb7a9 slap_global_control: unrecognized control: 1.3.6.1.4.1.42.2.27.8.5.1
ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" mech=SIMPLE ssf=0
ldap | 66bbb7a9 conn=1003 op=1 RESULT tag=97 err=0 text=
ldap | 66bbb7a9 conn=1003 op=2 UNBIND
ldap | 66bbb7a9 conn=1003 fd=13 closed
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
cpn02 | -- slurmctld is not available. Sleeping ...
ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
frontend | -- Waiting for slurmctld to become active ...
cpn01 | -- slurmctld is not available. Sleeping ...
Building the containers from scratch shows these errors. But if you run the demo as is and then look at logs you'll see the same types of issues. Going into the frontend container and starting services can get make some errors go away this and restarting the slurm container can make some errors go away. But in general the demo looks to be pretty janky. Hopefully this is being maintained but if not please respond and let me know this demo is intended to get buy in from folks at my university to perhaps use this. So I dont want to put a bad foot forward
coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Starting gunicorn 20.1.0 coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Listening at: unix:/srv/www/coldfront.sock (39) coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Using worker: sync coldfront | [2024-08-13 19:44:37 +0000] [40] [INFO] Booting worker with pid: 40 coldfront | [2024-08-13 19:44:37 +0000] [41] [INFO] Booting worker with pid: 41 coldfront | [2024-08-13 19:44:37 +0000] [42] [INFO] Booting worker with pid: 42 ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ldap | 66bbb7a9 conn=1003 fd=13 ACCEPT from IP=172.18.0.6:38824 (IP=0.0.0.0:636) ondemand | -- Waiting for frontend ssh to become active ... ldap | 66bbb7a9 conn=1003 fd=13 TLS established tls_ssf=256 ssf=256 ldap | 66bbb7a9 conn=1003 op=0 SRCH base="" scope=0 deref=0 filter="(objectClass=)" ldap | 66bbb7a9 conn=1003 op=0 SRCH attr= altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domainControllerFunctionality defaultNamingContext lastUSN highestCommittedUSN ldap | 66bbb7a9 conn=1003 op=0 SEARCH RESULT tag=101 err=0 nentries=1 text= ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" method=128 ldap | 66bbb7a9 slap_global_control: unrecognized control: 1.3.6.1.4.1.42.2.27.8.5.1 ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" mech=SIMPLE ssf=0 ldap | 66bbb7a9 conn=1003 op=1 RESULT tag=97 err=0 text= ldap | 66bbb7a9 conn=1003 op=2 UNBIND ldap | 66bbb7a9 conn=1003 fd=13 closed frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ...