knadh / listmonk

High performance, self-hosted, newsletter and mailing list manager with a modern dashboard. Single binary app.
https://listmonk.app
GNU Affero General Public License v3.0
15.22k stars 1.39k forks source link

Email Campaign performance settings - paused with "timed out waiting for free conn in pool" #2069

Open tanmay-predisai opened 1 month ago

tanmay-predisai commented 1 month ago

Version:

Campaign stopped abruptly after sending 120K emails with notification Status paused Sent 0 / 0 Reason Too many errors

When I checked the logs, there were >450 entries of "error sending message in campaign : subscriber <#>: timed out waiting for free conn in pool". I assume some threshold got hit and the campaign paused.

We have been successfully sending campaigns with >1M subscribers till now. However, this is the second time this week that the campaign has stopped. No settings have changed - only the number of subscribers were reduced in this campaign.

We are using AWS SES with enough limits (daily email sending limits >1.5M, per second email sending limit = 500)

Screenshots: Screenshot of the Performance Settings page -

Screenshot 2024-10-05 at 10 07 04 AM

Screenshot of the SMTP settings page -

Screenshot 2024-10-05 at 10 08 05 AM

The settings have not been changed recently.

How do we debug this / understand what has gone wrong and fix it? Thank you for your kind help!

knadh commented 1 month ago

Concurrency = 100, which means you'll have 100 concurrent messages being attempted to send. But you've set max messages to 2 per second, which makes this large number moot anyway.

But, your SMTP pool's max conns is set to 10, which means at max, you can only send 10 messages concurrently. The timeout is 5s, which means each connection can wait up to 5s. Any attempt to send more will wait on the pool and then just throw a timeout if the connections (SES) have slowed down for any reason.

You should reduce concurrency to something like 5-10 (given you've sent max messages to 2) and increase SMTP max conns to match.

The exact combination of these vary greatly based on many factors (speed of SMTP, number of messages to be send, the size of each message etc.), so you've to experiment and figure out.

tanmay-predisai commented 1 month ago

Hello - Thank you for your reply!

We tried with: Max. connections = 200 Max retries = 5 Concurreny = 100 max messages = 2

The idea was to equate concurrency * max messages per second to the SMTP Max connections. The campaign stopped at around 360K sent emails. We saw the same "waiting for free conn" and also "messages exceeded (12000) for the window (1m0s since 05 Oct 24 08:26 +0000). Sleeping for 57s." errors.

We will try again with: Max. connections = 20 Max retries = 2 Concurreny = 10 max messages = 2

Strangely, the number of emails sent per min was hovering around 5.6K (the same as it used to be) even after I changed the settings. I am not sure what has changed since the same setup has been working fine for >4 months and we send one newsletter every week with >1M emails. Will inform once we try with the reduced settings.

Also, maybe i should open another bug - but when we resumed a paused campaign, it started sending emails to folks who had already received the email again.

MaximilianKohler commented 1 month ago

This is confusing. It could definitely use some clarification in the docs.

Are you saying that the SMTP -> Max. connections value should be equal to the Performance -> Concurrency value?

Strangely, the number of emails sent per min was hovering around 5.6K (the same as it used to be) even after I changed the settings.

Someone else also reported that the concurrency = 10 and message_rate = 10, then up to 10x10=100 messages per second doesn't seem to be correct.

Since you had 100x2, that's 200 per second, 12,000 per minute. So 5.6k per minute doesn't make sense based on the information we're presented in the UI.

tanmay-predisai commented 1 month ago

I was trying to maximize the sending rate (based on our AWS SES per second limits) and hence the values given configuration.

Since you had 100x2, that's 200 per second, 12,000 per minute. So 5.6k per minute doesn't make sense based on the information we're presented in the UI.

True - I had assumed that given my AWS SES per second limit (500 per sec / 30K per minute), we would be sending around 12k per minute. However, since the number did not increase, there seems to be something wrong.

Either my configuration is incorrect/ there might be some bug in LM/ AWS SES are not respecting their limits. I am not sure how to debug this - any thoughts?

knadh commented 1 month ago

True - I had assumed that given my AWS SES per second limit (500 per sec / 30K per minute), we would be sending around 12k per minute.

That's the max allowed limit, but that's not a guarantee that it's possible to send so many e-mails to SES. The actual sending throughput can depend on a multitude of factors. Eg:

MaximilianKohler commented 1 month ago

You should edit the title of this issue to include "performance settings" now that there is useful discussion about that here. It'll make the info easier to find.

I'm also using SES, and my emails-per-second limit with Amazon is around 300 I think. I have listmonk set to 10 Concurrency, 10 Message rate, 5000 batch size, no sliding window limit, Max. connections = 10.

It should be sending out 100 emails per second, 6,000 per minute, but it took 4 minutes to send a campaign of 3,000 subscribers, which is about 12.5 per second. Yesterday it took 2 minutes to send 1,500.

There definitely seems to be something off about how this works. My server is under minimal load, and sending listmonk campaigns doesn't significantly increase the load. My database is not large.

tanmay-predisai commented 1 month ago

Quick update- We sent another campaign to 660K user list with these settings: SMTP - Max. connections = 100 SMTP - Max retries = 2 Concurreny = 10 max messages = 2 Batch Size = 6000

The email sending rate was ~950 emails per minute and it took >12 hours for the email to finish sending. I had tried to keep a large SMTP pool and target around 1200 emails per minute. I will try and experiment with 3-6x values in concurrency, max messages in the next campaign.

Would it be possible to share the ideal settings for this kind of scenario:

  1. Email size ~ 60KB
  2. AWS per second email sending limit = 500

What should be ideal values for SMTP - Max. connections SMTP -Max retries Concurrency max messages Batch Size

Thank you for your kind help!

MaximilianKohler commented 1 week ago

I never paid much attention to the messages-per-minute number, but I just looked, and it's maxing out at ~700/minute, which is ~12/s.

My settings are: SMTP - Max. connections = 10 Concurrency = 10 Message rate = 10 Batch Size = 5000 Email size = ~37 KB

I don't recall any information or guidance on what to set Max. connections to, so I've left it on the default. Since my 10x10 should be 6,000/minute it sounds like I need to increase the SMTP - Max. connections to 100. I'll put in a PR to clarify this.

knadh, what settings are you using to get 50k/minute?

knadh commented 1 week ago

It keeps getting tweaked based on mail size, number of mails to be sent, target SMTP servers load etc. Currently, I can see that it's set to:

Concurrency: 250 Message rate: 50 Batch size: 10,000 Sliding window limit: disabled

MaximilianKohler commented 1 week ago

What about your SMTP - Max. connections? Why do you have concurrency 5x as high as message rate?

EDIT: I tested: SMTP - Max. connections = 20 (up from 10) Concurrency = 6 (down from 10) Message rate = 6 (down from 10)

6x6 = 36/s x 60 = 2160 per minute. My actual rate lowered from 700/min (12/s) to 400/min (7/s).

I then changed it to SMTP - Max. connections = 20 Concurrency = 20 Message rate = 10

And it was sending at 1400/min, 23/s.

Increasing to: SMTP - Max. connections = 20 Concurrency = 30 Message rate = 10

Resulted in 1700/min, 28/s. So it's definitely not multiplying concurrency with message rate.

knadh commented 1 week ago

Concurrency=250 spawns 250 goroutines ("lightweight threads") that handle various aspects of sending a message, including template compilation (which is CPU bound and depends on the size and template complexity). It doesn't mean that the system will be able to send 250 messages per second. They'll just be prepared and if it's faster than the rate of messages that can be sent out (here, 50/sec max), they'll just wait in an in-memory queue.