postalserver / postal

📮 A fully featured open source mail delivery platform for incoming & outgoing e-mail
https://postalserver.io
MIT License
14.75k stars 1.04k forks source link

all pooled connections were in use #2631

Closed sandroshu closed 8 months ago

sandroshu commented 1 year ago

Describe the bug

Timeout error while processing unqueue

To Reproduce

  1. Send mail
  2. Open worker log
  3. See error

Expected behaviour

Process the queue

Environment details

Docker with 1 worker

Additional information/context

[9] [2023-10-02T08:26:55.397] INFO -- : [BLQPEM9OOP] Started processing UnqueueMessageJob job [9] [2023-10-02T08:26:55.391] WARN -- : [TF1EL6OITL] ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use [9] [2023-10-02T08:26:55.398] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:210:inblock in wait_poll' [9] [2023-10-02T08:26:55.398] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:199:in loop' [9] [2023-10-02T08:26:55.398] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:199:inwait_poll' [9] [2023-10-02T08:26:55.398] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:160:in internal_poll' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:286:ininternal_poll' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:155:in block in poll' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/lib/ruby/3.2.0/monitor.rb:202:insynchronize' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/lib/ruby/3.2.0/monitor.rb:202:in mon_synchronize' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:164:insynchronize' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:155:in poll' [9] [2023-10-02T08:26:55.399] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:870:inacquire_connection' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:588:in checkout' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:428:inconnection' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:1128:in retrieve_connection' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_handling.rb:327:inretrieve_connection' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/connection_handling.rb:283:in connection' [9] [2023-10-02T08:26:55.400] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/core.rb:490:incached_find_by_statement' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/core.rb:386:in find_by' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/dynamic_matchers.rb:66:infind_by_id' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /opt/postal/app/app/jobs/unqueue_message_job.rb:4:in perform' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /opt/postal/app/lib/postal/worker.rb:64:inreceive_job' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /opt/postal/app/lib/postal/worker.rb:96:in block in join_queue' [9] [2023-10-02T08:26:55.401] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer.rb:56:incall' [9] [2023-10-02T08:26:55.402] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/channel.rb:1842:in block in handle_frameset' [9] [2023-10-02T08:26:55.402] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer_work_pool.rb:108:inblock (2 levels) in run_loop' [9] [2023-10-02T08:26:55.402] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer_work_pool.rb:103:in loop' [9] [2023-10-02T08:26:55.402] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer_work_pool.rb:103:inblock in run_loop' [9] [2023-10-02T08:26:55.403] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer_work_pool.rb:102:in catch' [9] [2023-10-02T08:26:55.403] WARN -- : [TF1EL6OITL] /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/consumer_work_pool.rb:102:inrun_loop' [9] [2023-10-02T08:26:55.403] INFO -- : [TF1EL6OITL] Finished processing UnqueueMessageJob job in 5.013552212s`

hugoperes commented 1 year ago

I'm having the same problem. Is it a general problem with the Postal Server?

hugoperes commented 1 year ago

Guys, problem solved. It was a problem with my hosting and not with the Postal Server.

willpower232 commented 1 year ago

@hugoperes are you able to elaborate on what the issue was in case it is relevant here?

hugoperes commented 1 year ago

@willpower232 Then the problem repeated itself again. A timeout simply occurs when trying to verify the postal server connection using a user credential. At first, I believe it is due to the hosting, but as was repeated today, there is a possibility that it is a problem with the postal server.

sandroshu commented 12 months ago

Currently all mails are pending and this is the last line in the workers log (restarting workers make it work for some time):

[9] [2023-10-03T17:32:56.950] INFO -- : [ZNNWRFIO] Net::WriteTimeout: Net::WriteTimeout with #<TCPSocket:fd 17, AF_INET, x.x.x.x, 42741>
[9] [2023-10-03T17:33:05.318] INFO -- : [OACMZNJV3P] [16::41106 1991297] Message requeued for trying later.
[9] [2023-10-03T17:33:05.329] INFO -- : [ZNNWRFIO] Finishing up
[9] [2023-10-03T17:33:05.406] INFO -- : [OACMZNJV3P] Finished processing UnqueueMessageJob job in 1988.416179855s
E, [2023-10-03T17:33:05.406787 #9] ERROR -- #<Bunny::Session:0xca58 postal@127.0.0.1:5672, vhost=postal, addresses=[127.0.0.1:5672]>: Uncaught exception from consumer #<Bunny::Consumer:778060 @channel_id=1 @queue=deliver-jobs-outgoing-6 @consumer_tag=bunny-1696328771000-460158659925>: #<Bunny::ChannelAlreadyClosed: cannot use a closed channel! Channel id: 1, closed due to a server-reported channel error: PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more> @ /usr/local/bundle/gems/bunny-2.20.3/lib/bunny/channel.rb:2075:in `raise_if_no_longer_open!'

Additional info from rabbitmq container:

client unexpectedly closed TCP connection
2023-10-05 09:48:00.091 [info] <0.778.0> Closing all channels from connection '172.17.0.1:43518 -> 172.17.0.3:5672' because it has been closed
2023-10-05 09:48:07.304 [info] <0.785.0> accepting AMQP connection <0.785.0> (172.17.0.1:53996 -> 172.17.0.3:5672)
2023-10-05 09:48:07.311 [info] <0.785.0> connection <0.785.0> (172.17.0.1:53996 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 09:48:11.863 [warning] <0.701.0> closing AMQP connection <0.701.0> (172.17.0.1:43530 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 09:48:11.864 [info] <0.798.0> Closing all channels from connection '172.17.0.1:43530 -> 172.17.0.3:5672' because it has been closed
2023-10-05 09:48:18.543 [info] <0.805.0> accepting AMQP connection <0.805.0> (172.17.0.1:41426 -> 172.17.0.3:5672)
2023-10-05 09:48:18.549 [info] <0.805.0> connection <0.805.0> (172.17.0.1:41426 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 09:48:23.650 [warning] <0.704.0> closing AMQP connection <0.704.0> (172.17.0.1:43536 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 09:48:23.650 [info] <0.819.0> Closing all channels from connection '172.17.0.1:43536 -> 172.17.0.3:5672' because it has been closed
2023-10-05 09:48:32.969 [info] <0.827.0> accepting AMQP connection <0.827.0> (172.17.0.1:40944 -> 172.17.0.3:5672)
2023-10-05 09:48:32.976 [info] <0.827.0> connection <0.827.0> (172.17.0.1:40944 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 09:48:35.758 [warning] <0.695.0> closing AMQP connection <0.695.0> (172.17.0.1:43492 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 09:48:35.759 [info] <0.839.0> Closing all channels from connection '172.17.0.1:43492 -> 172.17.0.3:5672' because it has been closed
2023-10-05 09:48:43.837 [info] <0.847.0> accepting AMQP connection <0.847.0> (172.17.0.1:36440 -> 172.17.0.3:5672)
2023-10-05 09:48:43.842 [info] <0.847.0> connection <0.847.0> (172.17.0.1:36440 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 10:12:22.574 [info] <0.1311.0> accepting AMQP connection <0.1311.0> (172.17.0.1:40504 -> 172.17.0.3:5672)
2023-10-05 10:12:22.580 [info] <0.1311.0> connection <0.1311.0> (172.17.0.1:40504 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 10:25:50.653 [warning] <0.785.0> closing AMQP connection <0.785.0> (172.17.0.1:53996 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 10:25:50.655 [info] <0.1576.0> Closing all channels from connection '172.17.0.1:53996 -> 172.17.0.3:5672' because it has been closed
2023-10-05 10:25:59.724 [info] <0.1584.0> accepting AMQP connection <0.1584.0> (172.17.0.1:49424 -> 172.17.0.3:5672)
2023-10-05 10:25:59.732 [info] <0.1584.0> connection <0.1584.0> (172.17.0.1:49424 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 10:26:04.089 [warning] <0.805.0> closing AMQP connection <0.805.0> (172.17.0.1:41426 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 10:26:04.090 [info] <0.1598.0> Closing all channels from connection '172.17.0.1:41426 -> 172.17.0.3:5672' because it has been closed
2023-10-05 10:26:15.024 [info] <0.1606.0> accepting AMQP connection <0.1606.0> (172.17.0.1:44906 -> 172.17.0.3:5672)
2023-10-05 10:26:15.034 [info] <0.1606.0> connection <0.1606.0> (172.17.0.1:44906 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 10:26:19.402 [warning] <0.827.0> closing AMQP connection <0.827.0> (172.17.0.1:40944 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 10:26:19.403 [info] <0.1620.0> Closing all channels from connection '172.17.0.1:40944 -> 172.17.0.3:5672' because it has been closed
2023-10-05 10:26:29.287 [info] <0.1628.0> accepting AMQP connection <0.1628.0> (172.17.0.1:54838 -> 172.17.0.3:5672)
2023-10-05 10:26:29.296 [info] <0.1628.0> connection <0.1628.0> (172.17.0.1:54838 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'
2023-10-05 10:26:33.372 [warning] <0.847.0> closing AMQP connection <0.847.0> (172.17.0.1:36440 -> 172.17.0.3:5672, vhost: 'postal', user: 'postal'):
client unexpectedly closed TCP connection
2023-10-05 10:26:33.372 [info] <0.1642.0> Closing all channels from connection '172.17.0.1:36440 -> 172.17.0.3:5672' because it has been closed
2023-10-05 10:26:42.774 [info] <0.1650.0> accepting AMQP connection <0.1650.0> (172.17.0.1:33952 -> 172.17.0.3:5672)
2023-10-05 10:26:42.780 [info] <0.1650.0> connection <0.1650.0> (172.17.0.1:33952 -> 172.17.0.3:5672): user 'postal' authenticated and granted access to vhost 'postal'

requeuer also crashing:

[9] [2023-10-05T10:37:22.530] INFO -- : Running message requeuer...
rake aborted!
AMQ::Protocol::EmptyResponseError: Empty response received from the server.
/usr/local/bundle/gems/amq-protocol-2.3.2/lib/amq/protocol/frame.rb:60:in `decode_header'
/usr/local/bundle/gems/bunny-2.20.3/lib/bunny/transport.rb:286:in `read_next_frame'
/usr/local/bundle/gems/bunny-2.20.3/lib/bunny/session.rb:1203:in `init_connection'
/usr/local/bundle/gems/bunny-2.20.3/lib/bunny/session.rb:328:in `start'
/opt/postal/app/lib/postal/rabbit_mq.rb:26:in `create_connection'
/opt/postal/app/lib/postal/rabbit_mq.rb:31:in `create_channel'
/opt/postal/app/lib/postal/worker.rb:201:in `job_channel'
/opt/postal/app/lib/postal/worker.rb:206:in `job_queue'
/opt/postal/app/lib/postal/job.rb:37:in `queue'
/opt/postal/app/app/models/queued_message.rb:47:in `queue'
/usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/relation/delegation.rb:88:in `each'
/usr/local/bundle/gems/activerecord-6.1.7.3/lib/active_record/relation/delegation.rb:88:in `each'
/opt/postal/app/app/models/queued_message.rb:121:in `requeue_all'
/opt/postal/app/lib/postal/message_requeuer.rb:11:in `block in run'
/opt/postal/app/lib/postal/message_requeuer.rb:9:in `loop'
/opt/postal/app/lib/postal/message_requeuer.rb:9:in `run'
/opt/postal/app/lib/tasks/postal.rake:20:in `block (2 levels) in <top (required)>'
/usr/local/bundle/gems/sentry-ruby-5.8.0/lib/sentry/rake.rb:24:in `execute'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli/exec.rb:58:in `load'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli/exec.rb:58:in `kernel_load'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli/exec.rb:23:in `run'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli.rb:492:in `exec'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli.rb:34:in `dispatch'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/cli.rb:28:in `start'
/usr/local/bundle/gems/bundler-2.4.9/exe/bundle:45:in `block in <top (required)>'
/usr/local/bundle/gems/bundler-2.4.9/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
/usr/local/bundle/gems/bundler-2.4.9/exe/bundle:33:in `<top (required)>'
/opt/postal/app/bin/bundle:3:in `load'
/opt/postal/app/bin/bundle:3:in `<main>'
Tasks: TOP => postal:requeuer
(See full trace by running task with --trace)
hugoperes commented 12 months ago

Well, I think I managed to solve it here. Emails are now being sent again at a good rate per minute. Basically I accessed the postcard database and did one: truncate table queued_messages;

Then restart the postal server.

Note: I am using 8 workers.

willpower232 commented 12 months ago

@hugoperes let us know if the problem re-occurs, it would be interesting to know whether there is an issue with the size of the table or whether there could have been slightly corrupt data in one of the rows which was making using the database server difficult

hugoperes commented 12 months ago

@willpower232 So, the problem was still persisting. One thing I was noticing was that the SMTP connection check was having a delay. Sometimes it even gave a timeout.

So I did something here that apparently solved it:

I edited docker-compose, placing the restart: unless-stopped parameter in the worker container.

I restarted the postcard with 8 workers.

When testing the SMTP connection now, it returns a 200 response very quickly. It's not giving a timeout.

I'm suspecting that the queued email history was crashing the workers and they weren't automatically restarting.

masihfathi commented 9 months ago

@hugoperes did you find reason for this issue? I got this issue too much these days

sandroshu commented 8 months ago

Still "ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use"

Is this a dead project? Should we start looking for other solution?

adamcooke commented 8 months ago

Is this a dead project? Should we start looking for other solution?

Feel free to explore other options if this isn't working for you.