parcelvoy / platform

Parcelvoy: Open source multi-channel marketing automation platform. Send data-driven emails, sms, push notifications and more!
https://parcelvoy.com
MIT License
257 stars 47 forks source link

Campaign stuck after sending 30% Users #533

Closed tanmay-predisai closed 1 week ago

tanmay-predisai commented 3 weeks ago

Hello, we were looking to send our weekly newsletter campaign with Parcelvoy. We setup a campaign to send to a list of 350K+ users.

The campaign sent 99K emails and is still stuck in the Running state after 14+ hours. We checked AWS SES limits etc and everything seems to be in order. Till now, we have been using another tool to send newsletters and it sent the same email to a similar-sized audience the same day properly.

  1. How can we debug this? (I tried checking the docker worker logs, however since we also have journeys running in PV, its difficult to find.)
  2. While the email was sent to only 99k users, the DB usage increased by more than 3GB. Here are the table sizes Screenshot 2024-11-02 at 12 42 54 PM
  3. Here are some rows of the user_events table dump.csv
  4. Our SES rate limit is 500 emails per sec. PV AWS SES integration is configured at 200 per second.

Thank you for your help!

pushchris commented 2 weeks ago

Need some more details:

tanmay-predisai commented 2 weeks ago
  1. I have used the normal docker installation. How do i check the queue i am using?
  2. Here is the campaign_sends table Screenshot 2024-11-03 at 8 51 05 AM
tanmay-predisai commented 2 weeks ago

I have a thought - the basic difference between this newsletter campaign and the other campaigns (that are working fine) is that this one was scheduled for a specific time and not based on an event/or is part of a journey. Is that causing the remaining emails not to be sent as the (cron/batch?) job is not picking these up to be sent to the queue?

Another issue I noticed is that the campaign status still shows 99.8K sent (on the UI), while the above screenshot shows 161k sent. The campaign status is also not updated.

pushchris commented 2 weeks ago

Thats quite a few emails stuck in pending for campaign 70 (which I'm assuming is the one with issues?). If you are using the default installation then you should be using the Redis queue. The first thing I would check is the memory limits on it and if they are being reached or not. If so it could be that the job queue isn't functioning properly and so you are only getting a fraction of sends every few minutes instead of all of them being in there. If they were all in the queue I would expect to see a bunch of messages with a status of "throttled" or something similar

tanmay-predisai commented 2 weeks ago

Hello! Thank you for your revert.

This is what the stats look like - it seems Redis is not using more memory.

Screenshot 2024-11-04 at 9 39 35 AM

At a server level, we have 50% RAM unused. Should I allocate more memory and check?

I also checked the campaign_sends stats now and they look the same as my last screenshot. It seems that the remaining emails from the campaign are not being picked up for sending.

Is there something more i can check? Thank you for your help!

pushchris commented 1 week ago

Were you ever able to get to the bottom of this? My gut still thinks it is probably related to either Redis consuming too much memory and items getting purged or your job queue getting too backed up. You can see how the queue is performing by going to the Performance section under Settings -> Performance.

Also, depending on how many messages you are sending you may want to have more than one worker running to make sure you are actually able to hit your desired send rate. The default is concurrency for a worker is 40 sends and depending on response times from the API requests per second per concurrency may vary. If response time for a send is 200ms than you would be getting only 200 messages per second with one worker. To increase send rates you can just add additional workers to your Docker compose file.

tanmay-predisai commented 1 week ago

Thank you for the comment and my apologies for not sharing updates here.

We did the following:

  1. Upgraded PV (the upgrade got stuck midway and i had to reset the migrations table for it to work).
  2. the campaign started sending again post the upgrade was done.
  3. the rest of the journeys resumed after the campaign finished (I think i had to restart PV couple of times before the journeys started working again)
  4. I noticed you are working on improving the send performance. I tried upgrading PV but don't see the Settings -> Perormance yet. So decided to wait for the upgrade before we try sending the newsletter again.

Next steps:

  1. Will upgrade PV and also increase the worker count.
  2. Will send the newsletter and then get back with the status.

Closing this issue for now. Thank you for your help!