Closed chrism417 closed 2 years ago
The stack trace implies that it already retried and ran out of attempts. Unfortunately Postal is no longer responsible for keeping itself running, that is down to docker (or kubernetes) so the onus is on your monitoring I'm afraid.
The stack trace implies that it already retried and ran out of attempts. Unfortunately Postal is no longer responsible for keeping itself running, that is down to docker (or kubernetes) so the onus is on your monitoring I'm afraid.
I understand, however if the onus is on us, then postal should be monitoring the apps for the crash.
@willpower232 also, the requeuer restarts the app when it loses connection to rabbit, but none of the other apps do. Can this same restart be applied to cron/worker/etc?
Describe the bug
We're running rabbitmq-ha and if any of our three pods crash due to OOMKilling or just being moved to a new node, every app connecting to rabbit freezes and doesn't restart. For example, the worker will stop with the following logs but will never restart or reconnect:
W, [2022-07-11T20:21:51.437658 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Recovering from connection.close (CONNECTION_FORCED - broker forced connection closure with reason 'shutdown') W, [2022-07-11T20:21:51.438122 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Will recover from a network failure (no retry limit)... W, [2022-07-11T20:22:01.438561 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Retrying connection on next host in line: postal-rabbit.default:5672 W, [2022-07-11T20:22:16.449911 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: TCP connection failed, reconnecting in 5.0 seconds W, [2022-07-11T20:22:16.450312 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Will recover from a network failure (no retry limit)... W, [2022-07-11T20:22:26.450770 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Retrying connection on next host in line: postal-rabbit.default:5672 E, [2022-07-11T20:24:28.548876 #1] ERROR -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error) W, [2022-07-11T20:24:28.549027 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Exception in the reader loop: Timeout::Error: IO timeout when reading 7 bytes W, [2022-07-11T20:24:28.549077 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Backtrace: W, [2022-07-11T20:24:28.549126 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/cruby/socket.rb:68:in
rescue in read_fully' W, [2022-07-11T20:24:28.549164 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/cruby/socket.rb:56:inread_fully' W, [2022-07-11T20:24:28.549309 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/transport.rb:239:in
read_fully' W, [2022-07-11T20:24:28.549332 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/transport.rb:261:inread_next_frame' W, [2022-07-11T20:24:28.549347 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/reader_loop.rb:74:in
run_once' W, [2022-07-11T20:24:28.549361 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/reader_loop.rb:39:inblock in run_loop' W, [2022-07-11T20:24:28.549375 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/reader_loop.rb:36:in
loop' W, [2022-07-11T20:24:28.549390 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: /usr/local/bundle/gems/bunny-2.14.4/lib/bunny/reader_loop.rb:36:inrun_loop' W, [2022-07-11T20:24:28.549412 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Will recover from a network failure (no retry limit)... W, [2022-07-11T20:24:38.549794 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Retrying connection on next host in line: postal-rabbit.default:5672 E, [2022-07-11T20:26:38.561127 #1] ERROR -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Got an exception when receiving data: IO timeout when reading 7 bytes (Timeout::Error) W, [2022-07-11T20:28:38.547883 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Recovering from connection.close (CONNECTION_FORCED - broker forced connection closure with reason 'shutdown') W, [2022-07-11T20:28:38.548052 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Will recover from a network failure (no retry limit)... W, [2022-07-11T20:28:48.548464 #1] WARN -- #<Bunny::Session:0x55cdcfb158f8 postal@postal-rabbit.default:5672, vhost=postal, addresses=[postal-rabbit.default:5672]>: Retrying connection on next host in line: postal-rabbit.default:5672
To Reproduce
Run postal cron or postal worker Run rabbitmq-ha Delete one rabbitmq pod
Expected behaviour
If any app connecting to rabbit fails, restart the app or reconnect
Environment details
Deployed in k8s