Open joshuacwnewton opened 1 year ago
I'm following up on this as part of my task list. I'm hesitant to suggest the manual option, since it adds workload that IMO should be automatable.
Is there a reason you're sending outbound emails directly from the droplet? In the past when I've set up Discourse, I used a bulk mail provider and this would get around the possibility of mail getting dropped.
Regardless, I can set up a weekly canary email so it doesn't get overwhelming via cronjob with sendmail, would this help?
Is there a reason you're sending outbound emails directly from the droplet? In the past when I've set up Discourse, I used a bulk mail provider and this would get around the possibility of mail getting dropped.
cc'ing @kousu who I believe set up the mail server originally.
Regardless, I can set up a weekly canary email so it doesn't get overwhelming via cronjob with sendmail, would this help?
That would be lovely! :)
Is there a reason you're sending outbound emails directly from the droplet?
We were using SendGrid (it was SendGrid right?) and they started dropping mail -- I think we forgot to renew the credit card. I figured DigitalOcean allows outgoing SMTP and were already paying for it so I used that. It's one less point of failure. Plus then we're not rewarding protection rackets
I watched the mail logs for a few weeks after and only a few mails were getting delayed or dropped and I was able to fix up each problem.
SendGrid doesn't guarantee delivery! Email can still get flagged as spam. The sending IP is just one signal into the spam filters. At this point, our IP reputation should be as good as SendGrid or any other SMTP hoster -- though I don't know what dashboard I could look at to confirm that.
weekly canary email
that's a good idea!
here's another: a cronjob to grep the maillog for codes 250 and 5*, and make a histogram of them (| sort | uniq -c | sort -n
). If the 500s spike or the 200s drop we investigate.
@kousu How would we access that histogram?
If it's in a cronjob run by root, and you have an email in ~root/.forward
, then the stdout of that cronjob will go to that email.
Emails have started failing once more...
The issue is SSL certificate related once more:
The steps that I used to fix this problem previously are:
sudo journalctl --vacuum-size=100M
/var/discourse/launcher cleanup
, answering y
to both.df -h
shows >7GB of space for the 25GB /dev/vdal
drive.sudo systemctl restart opensmtpd
)Working on this now.
Still, it looks like we'll need to get to the bottom of why the SSL certificates aren't auto-renewing, when they absolutely should be.
Potentially related:
Email service has been restored. The steps above worked like a charm. :)
We use Discourse as our forum software, and set up our own custom mail server within the DigitalOcean VM.
Starting on March 19th, 2023, we experienced an outage of outbound emails that wasn't noticed until May 11th.
Thankfully, the outage was easy to fix, as it was caused by an expired SSL cert. But, the concern here is how long it took for the outage to be caught. Here are some of the factors involved in the outage:
#sct_forum_updates
.So, to better address this in the future, we would need to find some way to monitor emails and detect failures: