ubccr / hpc-toolset-tutorial

Tutorial for installing Open XDMoD, OnDemand, & ColdFront
GNU General Public License v3.0
121 stars 72 forks source link

Containers have issues from the get go #186

Open st1553 opened 3 months ago

st1553 commented 3 months ago

Building the containers from scratch shows these errors. But if you run the demo as is and then look at logs you'll see the same types of issues. Going into the frontend container and starting services can get make some errors go away this and restarting the slurm container can make some errors go away. But in general the demo looks to be pretty janky. Hopefully this is being maintained but if not please respond and let me know this demo is intended to get buy in from folks at my university to perhaps use this. So I dont want to put a bad foot forward

coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Starting gunicorn 20.1.0 coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Listening at: unix:/srv/www/coldfront.sock (39) coldfront | [2024-08-13 19:44:37 +0000] [39] [INFO] Using worker: sync coldfront | [2024-08-13 19:44:37 +0000] [40] [INFO] Booting worker with pid: 40 coldfront | [2024-08-13 19:44:37 +0000] [41] [INFO] Booting worker with pid: 41 coldfront | [2024-08-13 19:44:37 +0000] [42] [INFO] Booting worker with pid: 42 ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ldap | 66bbb7a9 conn=1003 fd=13 ACCEPT from IP=172.18.0.6:38824 (IP=0.0.0.0:636) ondemand | -- Waiting for frontend ssh to become active ... ldap | 66bbb7a9 conn=1003 fd=13 TLS established tls_ssf=256 ssf=256 ldap | 66bbb7a9 conn=1003 op=0 SRCH base="" scope=0 deref=0 filter="(objectClass=)" ldap | 66bbb7a9 conn=1003 op=0 SRCH attr= altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domainControllerFunctionality defaultNamingContext lastUSN highestCommittedUSN ldap | 66bbb7a9 conn=1003 op=0 SEARCH RESULT tag=101 err=0 nentries=1 text= ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" method=128 ldap | 66bbb7a9 slap_global_control: unrecognized control: 1.3.6.1.4.1.42.2.27.8.5.1 ldap | 66bbb7a9 conn=1003 op=1 BIND dn="cn=admin,dc=example,dc=org" mech=SIMPLE ssf=0 ldap | 66bbb7a9 conn=1003 op=1 RESULT tag=97 err=0 text= ldap | 66bbb7a9 conn=1003 op=2 UNBIND ldap | 66bbb7a9 conn=1003 fd=13 closed frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ... cpn02 | -- slurmctld is not available. Sleeping ... ondemand | nc: connect to frontend (172.18.0.9) port 22 (tcp) failed: Connection refused ondemand | -- Waiting for frontend ssh to become active ... frontend | -- Waiting for slurmctld to become active ... cpn01 | -- slurmctld is not available. Sleeping ...