tendenci / tendenci

Tendenci - The Open Source Association Management System (AMS)
https://www.tendenci.com
Other
488 stars 202 forks source link

Email Event Registrants - Website Crashes! #1127

Open CWalkden opened 2 years ago

CWalkden commented 2 years ago

We have a problem with sending an email to event registrants. We originally found this problem with an event with 250 registrants. Trying to email them resulted in the first 12 emails being received, then the website crashed. The remaining emails were never sent (we use Mailgun and have a log of all emails sent).

I've investigated with a test event, and was able to reproduce the problem (to a certain extent):

  1. Create an event and enable registration. Make a single pricing that does not require payment.
  2. Sign up a number of people - the problem did not occur for us with a small number of test subjects, but did happen with 14 people registered.
  3. Send an email to registrants via the Event menu ('E-mail Registrants').

From the user's point of view (the person sending the email), they received a 502 error message after a short wait. The website started working again after hitting refresh a couple of times.

All 14 emails were sent and received successfully. A summary log was received by the sender. Then another 14 emails were sent - identical to the first.

So there are two problems here, slightly different to the ones we saw in the wild:

  1. The website crashed.
  2. Two emails were sent instead of one.

@rob-hills has found some error log details for this, hopefully he'll post them here.

rob-hills commented 2 years ago

Here is what we found in the error log associated with the test @CWalkden mentioned above:

[2022-08-21 20:45:03 +1000] [2022179] [CRITICAL] WORKER TIMEOUT (pid:2022187)
[2022-08-21 20:45:03 +1000] [2022187] [INFO] Worker exiting (pid: 2022187)
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=17, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('112.213.34.123', 38110), raddr=('13.55.5.15', 443)>
[2022-08-21 20:45:04 +1000] [2022179] [WARNING] Worker with pid 2022187 was terminated due to signal 9
[2022-08-21 20:45:04 +1000] [2070065] [INFO] Booting worker with pid: 2070065
[2022-08-21 20:45:56 +1000] [2022179] [CRITICAL] WORKER TIMEOUT (pid:2070065)
[2022-08-21 20:45:56 +1000] [2070065] [INFO] Worker exiting (pid: 2070065)
[2022-08-21 20:45:57 +1000] [2070206] [INFO] Booting worker with pid: 2070206
jennyq commented 2 years ago

Looks like the issue is caused by timeout due to large number of registrants. The functionality needs to be updated. But for now, you can set the timeout to a large number (default 30 seconds) for gunicorn in your systemd unit file. For example --timeout 300.

rob-hills commented 1 year ago

I've updated our systemd unit file, reloaded it and restarted Tendenci so hopefully this will work around the problem until it is fixed.

This note is mainly a reminder to myself @rob-hills to revert this change to the systemd unit file when the bug is fixed!