perfsonar / toolkit

perfSONAR Toolkit distribution environment scripts and GUI
Apache License 2.0
30 stars 5 forks source link

httpd broken after upgrade to 5.0.8 #459

Open rhclopes opened 4 months ago

rhclopes commented 4 months ago

After upgrade to 5.0.8 several host in UK stopped publishing.

https://psmad.opensciencegrid.org/maddash-webui/index.cgi?dashboard=UK%20Mesh%20Config

That includes Jisc Slough, which is running on Alma 9. Actuaslly all UK hosts on Alma 9 in the UK stopped publishing.

I've just noticed that restarting httpd fixes the problem.

The file attached shows the error, and status after restart.

Raul PS: I notice that the docs on starting/stopping services leave important services like httpd and postgresql, see https://docs.perfsonar.net/install_el.html#step-8-starting-your-services:~:text=for%20any%20restarts.-,Step%208%3A%20Starting%20your%20services,-%C2%B6

rhclopes commented 4 months ago

http-error.txt

I had failed to upload it

mfeit-internet2 commented 4 months ago

All I can discern from this is that PostgreSQL wasn't available before Apache was restarted. The pScheduer API doesn't connect until it has a request to service, so restarting Apache may not have actually fixed it.

rhclopes commented 4 months ago

The timeline was;

(0) the host healthy on 5.0.7. (1) After update to 5.0.8, the host was running tests, but failed to show results on Maddash. I assumed Postgresql was at fault. (2) I restart Postgresql and the situation remained. (3) I stopped and started all services (incluing postrgesql, but NOT including httpd) and the situation remained the same: no results. (4) Then I remembered httpd, check the status, saw the error that is shown in the attached file. (5) I restarted httpd and things went to healthy.

Two other hosts in the UK (also Alma 9) saw the same issue after upgrade to 5.0.8. They've rebooted to fix.