Open acozine opened 4 months ago
I was able to see this live today.
Setting up nginx-plus (32-1~jammy) ...
{
"softwareVersion": "4.10.0",
"componentVersions": {
"wafEngineVersion": "11.48.0",
"wafNginxVersion": "5.48.0"
},
"error_message": "Bot Signature File update failed. Error: Failed to unpack /opt/app_protect/var/update_files/bot_signatures/bot_signatures.bin.tgz: 'tar (child): gzip: Cannot exec: No such file or directory\ntar (child): Error is not recoverable: exiting now\n/bin/tar: Child returned status 2\n/bin/tar: Error is not recoverable: exiting now\n'",
"completed_successfully": false,
"event": "configuration_load_failure"
}
nginx: configuration file /etc/nginx/nginx.conf test failed
invoke-rc.d: initscript nginx, action "upgrade" failed.
yet another... nginx dumping core
● nginx.service - NGINX Plus - high performance web server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: failed (Result: core-dump) since Wed 2024-08-28 16:27:31 UTC; 5 days ago
Docs: https://www.nginx.com/resources/
Main PID: 3889193 (code=dumped, signal=SEGV)
Aug 27 18:39:53 lib-adc1 nginx[3889184]: nginx: [warn] conflicting server name "cdh-test-d>
Aug 27 18:39:53 lib-adc1 nginx[3889184]: nginx: [warn] could not build optimal server_name>
Aug 27 18:39:53 lib-adc1 systemd[1]: Started NGINX Plus - high performance web server.
Aug 28 14:55:12 lib-adc1 systemd[1]: Reloading NGINX Plus - high performance web server.
Aug 28 14:55:12 lib-adc1 systemd[1]: Reloaded NGINX Plus - high performance web server.
Aug 28 16:27:17 lib-adc1 systemd[1]: Reloading NGINX Plus - high performance web server.
Aug 28 16:27:17 lib-adc1 systemd[1]: Reloaded NGINX Plus - high performance web server.
Aug 28 16:27:31 lib-adc1 systemd[1]: nginx.service: Main process exited, code=dumped, stat>
Aug 28 16:27:31 lib-adc1 systemd[1]: nginx.service: Failed with result 'core-dump'.
Aug 28 16:33:08 lib-adc1 systemd[1]: nginx.service: Unit cannot be reloaded because it is
fixed by starting it with
sudo systemctl start nginx
In a recent incident, we brought our load balancers down by upgrading the apt package
app_protect
. See this incident doc.We needed to get the production load balancers back up quickly, so we stopped using app_protect in production. We were only using the package on a few staging servers. We manually updated
nginx.conf
on the production LBs so they would not loadapp_protect
, then we commented it out in the individual site configs in #4867.Now that we have dev/test/staging load balancers, let's investigate what happened and how to get app_protect working again. Why did the upgrade break our existing configuration? What configuration changes would be needed to use
app_protect
successfully with the latest apt version?