Closed jpmckinney closed 5 months ago
This was due to a rabbitmq-sever patch
I closed https://github.com/open-contracting/kingfisher-collect/issues/1033, so I'll close this issue.
If there are any new RabbitMQ-related messages in Sentry, I can use this issue in future.
RabbitMQ restarts might still cause errors to be reported. If so, I think the solution is here: https://github.com/open-contracting/yapw/issues/2#issuecomment-1911046356
Kingfisher Collect has had issues with restarts, because it only publishes messages, and over a long period of time. The others only ack/nack/publish messages after consuming a message. Since RabbitMQ cancels consumers when restarting, there is maybe only a narrow window in which the consumer can attempt a method on a closing/closed connection.
The info messages in /var/log/rabbitmq/rabbit@ocp##.log don't seem relevant. Can search with
grep -v info
orzgrep -v info
to find the other error levels (notice, warning, error).The registry server (ocp13) on 2014-01-18 10:10:46 got "RabbitMQ is asked to stop...", and it stopped by 2024-01-18 10:10:51. It then started again on 2024-01-18 10:10:54.
Looking in Prometheus, the only signals are that memory usage and swapped dropped after restart (not surprising), but it was not high before restart (40%, 175MB).
Looking at /var/log/syslog at the same time, I see messages relating to apt around the same time, so I assume RabbitMQ was upgraded and therefore restarted.
This generated messages in Kingfisher Collect, because it uses a blocking connection and not an async client (only the latter can handle connection close events). To resolve that, we need to close https://github.com/open-contracting/kingfisher-collect/issues/1033
I'll keep this issue open to investigate any other restarts. #238 explains another restart scenario.