perfsonar / perfsonar-testpoint-docker

Apache License 2.0
12 stars 15 forks source link

Supervisord-based Container Doesn't Function #22

Open mfeit-internet2 opened 3 years ago

mfeit-internet2 commented 3 years ago

Ignacio Peluaga Lozada writes:

I have been experiencing issues with the testpoint image lately. I created a container using the latest image (4.4.1) and always got "Run did not complete: Missed" on pscheduler's CLI, regardless of the task or target node. On the same host I had the Docker container I installed the toolkit v4.4.1 and everything worked. Besides, I tried some other Docker image versions:

-perfsonar/testpoint:v4.4.0: same issues as with v4.4.1. -perfsonar/testpoint:v4.3.4: worked fine.
-perfsonar/testpoint:systemd: worked fine.

Therefore I believe the problem is with perfSONAR's supervisord based v4.4.x testpoint images. Is anyone else experiencing this?

Internet2 saw this as well. The runner service fails to start.

DanielNeto commented 3 years ago

It seems to be a problem with the docker version. With the latest version 20.10.8 it doesn't work, but with versions 20.10.0 and 19.03.9 the tasks run correctly, even though pscheduler processes keep restarting all the time. I still haven't figured out what changed between versions to cause this.

Here, a snippet of the container log with docker 19.03.9

2021-09-28 18:56:42,952 INFO spawned: 'pscheduler-scheduler' with pid 1430
2021-09-28 18:56:42,952 INFO success: pscheduler-runner entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:43,600 INFO exited: pscheduler-archiver (exit status 1; not expected)
2021-09-28 18:56:43,768 INFO spawned: 'pscheduler-archiver' with pid 1432
2021-09-28 18:56:44,300 INFO success: pscheduler-scheduler entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:44,344 INFO exited: pscheduler-ticker (exit status 1; not expected)
2021-09-28 18:56:44,900 INFO spawned: 'pscheduler-ticker' with pid 1435
2021-09-28 18:56:44,900 INFO success: pscheduler-archiver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-28 18:56:44,900 INFO exited: pscheduler-runner (exit status 1; not expected)
pmooo commented 2 years ago

Updating supervisor resolved this issue on my images using supervisord and docker 20,10,1+.

Must be a version after this merge to the API supervisor uses:

https://github.com/docker/docker-py/commit/1757c974fa3a05b0e9b783af85242b18df09d05d

You may have to install around yum repo using python3 pip in the dockerfile.

yorickps commented 2 years ago

Experienced exactly the same issue. Using the systemd based image now.

MiddelkoopT commented 2 years ago

Fixes for this applied to the 5.0.0 branch. This let's supervisord manage the process instead of using --daemon options. Look at /etc/supervisord.conf for changes.