os-autoinst / openQA

openQA web-frontend, scheduler and tools.
http://openqa.opensuse.org/
GNU General Public License v2.0
316 stars 205 forks source link

Allow restarting `openqa-webui-daemon` without downtime #5820

Closed Martchus closed 1 month ago

Martchus commented 1 month ago
Martchus commented 1 month ago

It works locally when installing the packages from the OBS check. (If you want to reproduce, be sure to also install openQA-common because the reuse=1 change is part of that sub package.)

I tested this by hammering the F5 key in the web browser why reloading the service via sleep 5 && sudo systemctl reload openqa-webui. Without this change there's a time window of around a second where one gets no connection and with the change this doesn't happen.

The journal also looks good - so the old service is really only stopped once the new one is starting:

Aug 05 17:20:33 linux-9lzf systemd[1]: Reloaded The openQA web UI.
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Listening at "http://127.0.0.1:9526?reuse=1"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: Web application available at http://127.0.0.1:9526
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Listening at "http://[::1]:9526?reuse=1"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: Web application available at http://[::1]:9526
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Manager 52867 started
…
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52908]: [info] Worker 52908 started
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Creating process id file "/var/lib/openqa/webui/prefork-1.pid"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52909]: [info] Worker 52909 started
Aug 05 17:20:36 linux-9lzf openqa-webui-daemon[52799]: [warn] Stopping worker 52833 immediately
Aug 05 17:20:36 linux-9lzf openqa-webui-daemon[52799]: [warn] Stopping worker 52831 immediately
…

The output of systemctl status also looks good. All the PIDs of prefork processes are replaced after a reload and there are no leftover processes.

I also already have a fix for the failing test.

I still need to ensure that the service is not restarted on updates via the rpm scripts and that other services being restarted don't trigger a restart of the web UI. And I also need to add reload: True in our salt states (according to https://docs.saltproject.io/en/latest/ref/states/all/salt.states.service.html).

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.50%. Comparing base (40fce5a) to head (a1f44e4). Report is 5 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #5820 +/- ## ======================================= Coverage 98.50% 98.50% ======================================= Files 395 395 Lines 38715 38715 ======================================= Hits 38136 38136 Misses 579 579 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Martchus commented 1 month ago

I tested the package from OBS checks locally and it works. So reinstalling/updating the package now causes the main service to reload and other services are still restarted.

I also didn't run into any limits regarding PostgreSQL connections. However, in production we might have other limits so I'll check whether I can run two prefork instances (which all the usual settings) in parallel on OSD and o3.

EDIT: I can run sudo -u geekotest /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 45 -c 1 -G 800 -l 'http://[::]:8080' on OSD/o3 and it works. So we have enough headroom for database connections.