open-contracting / credere-backend

A tool that facilitates the participation of Micro, Small, and Medium businesses (MSMEs) in the Colombian public procurement market.
https://credere.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

Serve email assets (images, etc.) from CDN (e.g. AWS) #310

Closed jpmckinney closed 2 months ago

jpmckinney commented 2 months ago

Right now, images (referenced by the templates stored in this repository) are routed to port 8000 which proxies to credere-frontend.

Each email loads 8 images, and during mass-mailings, we can receive a lot of traffic at once. (Either lots of emails are being opened, or lots of email clients are pre-loading images.)

Offloading traffic from emails to a CDN might avoid some OOM errors that we've observed in Apache's error.log.

I think we went with linked images, because all other options had worse downsides.

We can do this in a forward-looking way only. Existing emails can continue to send traffic to our server. The main concern is new traffic around mass-mailings. We don't expect old, re-opened emails to cause an issue.

jpmckinney commented 2 months ago

@yolile I think it would be prudent to resolve this issue before sending more emails with fetch-all-awards-from-period

yolile commented 2 months ago

Sounds good. Do we have a CDN set up already that I can use?

jpmckinney commented 2 months ago

We probably need to set up AWS Cloudfront, and then change the image URLs to Cloudfront URLs

Edit: Deploying d32q61blueh6u0.cloudfront.net (not yet available).

It's possible to use a custom domain like cdn.credere.open-contracting.org but it takes some effort (search "letsencrypt custom cloudfront"). Some years ago, it was reported that Yahoo (and maybe others) wouldn't load images from Cloudfront (not sure if still the case). Using custom domains is suggested as a solution. AWS URLs might also be a signal for spam.

Edit2: Cloudfront is available, but it complains:

502 ERROR

The request could not be satisfied.

CloudFront wasn't able to connect to the origin. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner. If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Looking into it, but nothing obvious yet.

Edit3: Can confirm that:

curl -I https://d32q61blueh6u0.cloudfront.net

HTTP/2 502 
content-type: text/html
content-length: 951
server: CloudFront
date: Sat, 10 Aug 2024 00:11:45 GMT
x-cache: Error from cloudfront
via: 1.1 39bd4dd36d89ac693c6b532053af59d6.cloudfront.net (CloudFront)
x-amz-cf-pop: YUL62-P2
x-amz-cf-id: ZzMFVVPNjtQHjfklkt8PojaPwYeI3--CBlnKAWKAnhdlLFSEtnlqNA==

curl --http1.1 -I https://d32q61blueh6u0.cloudfront.net

HTTP/1.1 502 Bad Gateway
Content-Type: text/html
Content-Length: 951
Connection: keep-alive
Server: CloudFront
Date: Sat, 10 Aug 2024 00:11:37 GMT
X-Cache: Error from cloudfront
Via: 1.1 212f3832d7f59d71fd3926166fcc89ae.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: YUL62-P2
X-Amz-Cf-Id: AjRUhpslY5hdH5_ys0H6Kiek9X3o8UR9sUFkdtNAtmSEExif_WiPDg==

tail -f /var/log/apache2/access.log /var/log/apache2/error.log /var/log/apache2/other_vhosts_access.log doesn't show any incoming request.

jpmckinney commented 2 months ago

Origin request policy: AllViewer - maybe need to override Host header?

Okay, so setting "Origin request policy" to None fixes the issue.

Cloudfront works e.g.

https://credere.open-contracting.org/images/facebook.png https://d32q61blueh6u0.cloudfront.net/images/facebook.png

Next is to configure the custom domain.

jpmckinney commented 2 months ago

And now custom domain works, e.g. https://cdn.credere.open-contracting.org/images/facebook.png

As usual, please test sending an email on the dev server before committing to the CDN for thousands of emails :)

yolile commented 2 months ago

Thank you! It works!