Closed aebruno closed 2 years ago
It's something similar to the slurmd issue. Something's being cached that apache doesn't like, like a stale PID file or something.
This is still not working for me. Steps to re-produce:
Start fresh then stop:
$ ./hpcts start
...
$ ./hpcts stop
Start containers again
$ ./hpcts start
ColdFront and XDMoD start fine. OnDemand and DEX fail to come back up, here;s the logs:
ondemand | nc: connect to frontend (172.19.0.7) port 22 (tcp) failed: Connection refused
ondemand | -- Waiting for frontend ssh to become active ...
ondemand | Connection to frontend (172.19.0.7) 22 port [tcp/ssh] succeeded!
ondemand | ---> Cleaning NGINX ...
ondemand | can't find user for hpcadmin
ondemand | Run 'nginx_stage --help' to see a full list of available command line options.
@johrstrom I know it's getting down to the wire here but would be great if we could sort this out. I'm happy to rebuild the OOD containers again.
I tried this a few times and seems like a race condition. Unless we sort this out, we'll just have to let users know if they hit this to try again or run:
./hpcts destroy
./hpcts start
The above should always bring everything back up fresh without having to re-download the images.
:face_palm - I'm sorry that I thought this was settled. Yes we likely need to run nginx_stage
after we start SSSD. My sincere apologies.
without having to re-download the images.
It's not about downloading the images - it's about starting a container that had previously been started. They don't need a new image, they need a new/fresh container. Not starting an older container.
I believe docker-compose down
stops and removes containers whereas docker-compose stop
just stops existing containers so that it can start them up again later.
It would be nice if users could stop the containers and restart them later without losing their state. For example, user completes the first half of the tutorial, stop containers go eat lunch etc. Then come back and start containers again should allow them to pick up where they left off. This flow currently works:
OnDemand restarts just fine, however a
docker-compose down
stops and removes the containers (and any networks).This flow causes OnDemand to come backup in "offline mode":
@johrstrom Any thoughts? Seems like we should be able to support the stop/start of the containers.