perfsonar / perl-shared

Shared libraries used by perl packages and referenced as a submodule in many git repos
Apache License 2.0
7 stars 3 forks source link

Service status check should use service API #79

Open laeti-tia opened 5 years ago

laeti-tia commented 5 years ago

The toolkit homepage Services list display the status of each service. This is coming from the /toolkit/services/host.cgi?method=get_services call. The internal of this CGI is handled by the check_running sub of NPToolkit/Services/Base.pm which is looking at the list of running processes by names or by PID coming from the OS starting scripts.

This way of doing is not very robust (process names used are very broad, i.e. python, and PID files might be unreadable by the CGI) nor very accurate (a process might be running, but the corresponding service can still be down or failing in some way).

We could improve that by doing proper service calls to check that the listed service is actually running and ready to serve. pScheduler has a dedicated /status API endpoint dedicated to that. Some of the other services checked probably have too.

laeti-tia commented 4 years ago

To check pScheduler status, we should call https://localhost/pscheduler/status and compare last and expected time stamps. If expected > last it means the service is running:

curl -sk https://localhost/pscheduler/status
{
    "runs": {
        "last-finished": null,
        "last-scheduled": "2020-08-20T00:22:11Z"
    },
    "services": {
        "archiver": {
            "expected": "2020-08-21T15:57:49Z",
            "last": "2020-08-21T15:57:34Z",
            "next_time": "PT15S",
            "overdue": null,
            "updates": 14497,
            "uptime": "P1DT17H52M3S"
        },
        "runner": {
            "expected": "2020-08-21T15:58:08Z",
            "last": "2020-08-21T15:57:08Z",
            "next_time": "PT1M",
            "overdue": null,
            "updates": 3652,
            "uptime": "P1DT17H52M3S"
        },
        "scheduler": {
            "expected": "2020-08-21T15:57:47Z",
            "last": "2020-08-21T15:57:37Z",
            "next_time": "PT10S",
            "overdue": null,
            "updates": 10809,
            "uptime": "P1DT17H52M3S"
        },
        "ticker": {
            "expected": "2020-08-21T15:57:46Z",
            "last": "2020-08-21T15:57:31Z",
            "next_time": "PT15S",
            "overdue": null,
            "updates": 14414,
            "uptime": "P1DT17H52M3S"
        }
    },
    "time": "2020-08-21T15:57:40Z"
}