Open bdumpp opened 8 months ago
When a graceful shutdown time is provided, it's taken into account for stopping both the server and plugins. This error is coming from the status plugin while it must be uploading a status update.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.
When running without setting the --shutdown-wait-period
flag, the OPA will log the following when being shutdown.
{"level":"info","msg":"Status update sent successfully in response to decision log update.","plugin":"status","time":"2024-06-04T11:56:58Z"}
{"client_addr":"100.78.24.1:53480","level":"info","msg":"Received request.","req_id":9,"req_method":"GET","req_path":"/health","time":"2024-06-04T11:56:59Z"}
{"client_addr":"100.78.24.1:53480","level":"info","msg":"Sent response.","req_id":9,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.431618,"resp_status":200,"time":"2024-06-04T11:56:59Z"}
{"client_addr":"100.78.24.1:53496","level":"info","msg":"Received request.","req_id":10,"req_method":"GET","req_path":"/health","time":"2024-06-04T11:57:01Z"}
{"client_addr":"100.78.24.1:53496","level":"info","msg":"Sent response.","req_id":10,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.348637,"resp_status":200,"time":"2024-06-04T11:57:01Z"}
{"level":"info","msg":"Shutting down...","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"Server shutdown.","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"Stopping bundle loader.","name":"systems/b9ecfbbd532a470fb1fdb3fefab1f5f7","plugin":"bundle","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"Stopping decision logger.","plugin":"decision_logs","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"Flushing decision logs.","plugin":"decision_logs","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"All decisions in buffer uploaded.","plugin":"decision_logs","time":"2024-06-04T11:57:03Z"}
{"level":"info","msg":"Stopping status reporter.","plugin":"status","time":"2024-06-04T11:57:03Z"}
{"level":"error","msg":"Status update failed: Post \"http://styrasystem-slp/v1/status\": context canceled.","plugin":"status","time":"2024-06-04T11:57:03Z"}
{"level":"error","msg":"Status update failed: Post \"http://styrasystem-slp/v1/status\": context canceled.","plugin":"status","time":"2024-06-04T11:57:03Z"}
Setting the flag to e.g. 10, it will log the same as above, but with the extra Waiting 10s before initiating shutdown...
line. This indicates that 10 seconds is not enough time to shutdown.
However, If I set the --shutdown-wait-period to something high like 50 seconds, the OPA will not log anything related to shutting down, but will just stop running:
{"client_addr":"100.78.24.1:59184","level":"info","msg":"Received request.","req_id":36,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:09Z"}
{"client_addr":"100.78.24.1:59184","level":"info","msg":"Sent response.","req_id":36,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.389676,"resp_status":200,"time":"2024-06-04T12:28:09Z"}
{"level":"info","msg":"Status update sent successfully in response to bundle update.","plugin":"status","time":"2024-06-04T12:28:09Z"}
{"level":"info","msg":"Waiting 50s before initiating shutdown...","time":"2024-06-04T12:28:10Z"}
{"client_addr":"100.78.24.1:59188","level":"info","msg":"Received request.","req_id":37,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:11Z"}
{"client_addr":"100.78.24.1:59188","level":"info","msg":"Sent response.","req_id":37,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.787576,"resp_status":200,"time":"2024-06-04T12:28:11Z"}
{"level":"info","msg":"Status update sent successfully in response to decision log update.","plugin":"status","time":"2024-06-04T12:28:12Z"}
{"client_addr":"100.78.24.1:59196","level":"info","msg":"Received request.","req_id":38,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:13Z"}
....
{"client_addr":"100.78.24.1:57560","level":"info","msg":"Received request.","req_id":48,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:33Z"}
{"client_addr":"100.78.24.1:57560","level":"info","msg":"Sent response.","req_id":48,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.378612,"resp_status":200,"time":"2024-06-04T12:28:33Z"}
{"level":"info","msg":"Status update sent successfully in response to bundle update.","plugin":"status","time":"2024-06-04T12:28:34Z"}
{"client_addr":"100.78.24.1:39568","level":"info","msg":"Received request.","req_id":49,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:35Z"}
{"client_addr":"100.78.24.1:39568","level":"info","msg":"Sent response.","req_id":49,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.413382,"resp_status":200,"time":"2024-06-04T12:28:35Z"}
{"client_addr":"100.78.24.1:39570","level":"info","msg":"Received request.","req_id":50,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:37Z"}
{"client_addr":"100.78.24.1:39570","level":"info","msg":"Sent response.","req_id":50,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":1.012793,"resp_status":200,"time":"2024-06-04T12:28:37Z"}
{"client_addr":"100.78.24.1:39580","level":"info","msg":"Received request.","req_id":51,"req_method":"GET","req_path":"/health","time":"2024-06-04T12:28:39Z"}
{"client_addr":"100.78.24.1:39580","level":"info","msg":"Sent response.","req_id":51,"req_method":"GET","req_path":"/health","resp_bytes":3,"resp_duration":0.483066,"resp_status":200,"time":"2024-06-04T12:28:39Z"}
{"level":"info","msg":"Status update sent successfully in response to decision log update.","plugin":"status","time":"2024-06-04T12:28:39Z"}
Do you set any value for --shutdown-grace-period
?
No, I couldn't really see any difference when setting the flag. For example, if I set it to 50 and deleted the OPA pod, it would just close down as if the flag wasn't set.
The plugin manager will create a new context with the timeout value set to the Graceful Shutdown Period (--shutdown-grace-period
). A quick look at the code in the status plugin and it looks like it may be not be taking that timeout into account iiuc.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.
We are experiencing OPA pods logging errors during shutdown, even when it should be a graceful shutdown because a new pod is being deployed in an old pod's place. The error message we consistenly get when restarting pods is
{"level":"error","msg":"Status update failed: Post \"[http://slp-name/v1/status\](http://slp/v1/status/)": context canceled.","plugin":"status","time":"2024-04-04T11:38:45+02:00"}
We have noticed this in versions 0.60.0 and 0.62.1.