Closed b-meson closed 6 years ago
Website is back up. Deployment notes need to go in before we can move to "done". I make some mistakes in the deployment process that we should have caught with careful deployments to staging. Therefore, this issue can't be closed until I write a full debrief. Sorry for the messy downtime.
Here is my writeup about what issues I ran into while deploying the new CSPs and why it turned into a mess.
When we switched from Digital Ocean to AWS (S3 + Cloudfront) we did not notice our "score" from securityheader.io dropped from an A to an F. In particular, we silently dropped HSTS support and CSPs from our headers because these headers don't really "exist" in Cloudfront.
The way to apply these headers is to use a custom Lambda@Edge which is not the same thing as a Lambda. The @Edge
lambdas can only be created in us-east-1 (N Virginia) even though there are technically three "Edge" sites announced by Amazon as of today. Cloudfront invokes these lambdas as a response to a request from our website.
In order to deploy a Lamba@Edge
, you must be aware that the documentation for these functions is often wrong or misleading. Here are some things I discovered:
us-east-1
.Node.js 6.10
Lambda@Edge
(which is not the same as a Lambda IAM role). You can not add a @Edge
IAM role to a custom IAM function despite documentation telling you that you can. It should also be noted that the Cloudfront messages will tempt you into looking at IAM roles and tell you to create a new role that includes both lambda
and lambda@edge
. No such thing exists AFAIK.arn
you invoke from Cloudfront must be a versioned number).After deploying our new custom CSPs (invoking the lambda from Cloudfront), I tested them against securityheaders.io and saw the staging site was loading resources with an "A" grade. Convinced that was sufficient testing, I copied the lambda from staging to prod. I did not test these in a browser and check for resources loading. In addition, since we tore down our old DigitalOcean servers, we did not have the old nginx
config lying around to check against. I naïvely expected that src 'self';
would be sufficient because that's how I recalled them.
After realizing my mistake (i.e that the web browser was blocking resources from loading), I disabled traffic to the prod server and continued testing against staging. Around 1am PST, I was able to confirm that the following CSP default-src 'self'; img-src 'self'; script-src 'self' https://lucyparsonslabs.com 'unsafe-inline'; style-src 'unsafe-inline' https://lucyparsonslabs.com; object-src 'self'
allowed all resources from our website to load properly. At that time, I updated the prod lambda and re-enabled traffic to production.
We should have tested the security headers after moving to S3 (or maybe put that in monitoring somewhere). I didn't properly test the CSPs by checking the browser loading resources and that's why I took down production while I tested on staging. AWS's documentation in general, and Lambda@Edge
in particular, is awful.
Related to #172, we need to add a good CSP and fix the (abysmal) security headers rating we have right now.