Today at approximately 11:15 UTC, the WPT server running in production stopped
responding to HTTPS requests made to port 443. Because the server is
implemented with a number of independent sub-processes, the production
deployment continued to serve HTTP and WebSocker requests.
This project includes a simple recovery mechanism to automatically re-start the
server in case of failure. This mechanism was never triggered because the
parent process never halted. The server was eventually restarted (and HTTPS
traffic once again enabled) by an independent subsystem (namely, the subsystem
which fetches the latest code from WPT and responds to changes by restarting
the server).
Because a server that is not responding to HTTPS traffic is not capable of
performing its role, this is an invalid state that should be avoided. I've
submitted a simple fix that ensures the parent process halts in response to
failure in any sub-process:
The conditions which initially caused the HTTPS sub-process to fail are not yet
known. The fix referenced above will allow us to recover from the problem more
rapidly, but it will not address the underlying issue. We'll keep our eye out
for more information.
Today at approximately 11:15 UTC, the WPT server running in production stopped responding to HTTPS requests made to port 443. Because the server is implemented with a number of independent sub-processes, the production deployment continued to serve HTTP and WebSocker requests.
This project includes a simple recovery mechanism to automatically re-start the server in case of failure. This mechanism was never triggered because the parent process never halted. The server was eventually restarted (and HTTPS traffic once again enabled) by an independent subsystem (namely, the subsystem which fetches the latest code from WPT and responds to changes by restarting the server).
Because a server that is not responding to HTTPS traffic is not capable of performing its role, this is an invalid state that should be avoided. I've submitted a simple fix that ensures the parent process halts in response to failure in any sub-process:
https://github.com/web-platform-tests/wpt/pull/12557
The conditions which initially caused the HTTPS sub-process to fail are not yet known. The fix referenced above will allow us to recover from the problem more rapidly, but it will not address the underlying issue. We'll keep our eye out for more information.