Open CWalkden opened 2 years ago
Here is what we found in the error log associated with the test @CWalkden mentioned above:
[2022-08-21 20:45:03 +1000] [2022179] [CRITICAL] WORKER TIMEOUT (pid:2022187)
[2022-08-21 20:45:03 +1000] [2022187] [INFO] Worker exiting (pid: 2022187)
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=17, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('112.213.34.123', 38110), raddr=('13.55.5.15', 443)>
[2022-08-21 20:45:04 +1000] [2022179] [WARNING] Worker with pid 2022187 was terminated due to signal 9
[2022-08-21 20:45:04 +1000] [2070065] [INFO] Booting worker with pid: 2070065
[2022-08-21 20:45:56 +1000] [2022179] [CRITICAL] WORKER TIMEOUT (pid:2070065)
[2022-08-21 20:45:56 +1000] [2070065] [INFO] Worker exiting (pid: 2070065)
[2022-08-21 20:45:57 +1000] [2070206] [INFO] Booting worker with pid: 2070206
Looks like the issue is caused by timeout due to large number of registrants. The functionality needs to be updated. But for now, you can set the timeout to a large number (default 30 seconds) for gunicorn in your systemd unit file. For example --timeout 300
.
I've updated our systemd unit file, reloaded it and restarted Tendenci so hopefully this will work around the problem until it is fixed.
This note is mainly a reminder to myself @rob-hills to revert this change to the systemd unit file when the bug is fixed!
We have a problem with sending an email to event registrants. We originally found this problem with an event with 250 registrants. Trying to email them resulted in the first 12 emails being received, then the website crashed. The remaining emails were never sent (we use Mailgun and have a log of all emails sent).
I've investigated with a test event, and was able to reproduce the problem (to a certain extent):
From the user's point of view (the person sending the email), they received a 502 error message after a short wait. The website started working again after hitting refresh a couple of times.
All 14 emails were sent and received successfully. A summary log was received by the sender. Then another 14 emails were sent - identical to the first.
So there are two problems here, slightly different to the ones we saw in the wild:
@rob-hills has found some error log details for this, hopefully he'll post them here.