Make S3 buckets support Transfer Acceleration

ronaldtse commented 3 years ago

S3 buckets support Transfer Acceleration, which is basically allow uploading files to a CloudFront endpoint, and AWS uses its internal network to proxy the upload to S3.

S3 buckets that are static website do not support Transfer Acceleration, as per: https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html

The name of the bucket used for Transfer Acceleration must be DNS-compliant and must not contain periods (".").

This means that S3 websites, which requires setting the bucket name to host.mydomain.com, cannot use Transfer Acceleration. The major difference between a static website S3 bucket, and a normal S3 bucket, is the ability to resolve paths like /foo/bar/ to /foo/bar/index.html.

We recently ran into deployment speeds for a larger website with 22,000 files, which takes a very long time to upload. Therefore Transfer Acceleration is necessary. However since this is a website, we cannot lose the index.html path rewrite functionality.

The solution is:

Name the S3 bucket with a non-domain name
In the CloudFront distribution, point directly to the S3 bucket
Use a Lambda@Edge function to perform the index URL rewrites

The Lambda function is described here: https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/

We have actually written a better version in 2018, provided in https://github.com/riboseinc/terraform-aws-s3-cloudfront-website/issues/1#issuecomment-387272130. (pasted here for completeness):

'use strict';

const pointsToFile = uri => /\/[^/]+\.[^/]+$/.test(uri);
const hasTrailingSlash = uri => uri.endsWith('/');

exports.handler = (event, context, callback) => {
    // Extract the request from the CloudFront event that is sent to Lambda@Edge 
    var request = event.Records[0].cf.request;

    // Extract the URI and query string from the request
    const olduri = request.uri;
    const qs = request.querystring;

    if (pointsToFile(olduri)) {
        callback(null, request);
        return;
    }

    // Append ".html" extension
    if (!hasTrailingSlash(olduri)) {
        request.uri = uri + ".html";
    } else {
    // Append "index.html"
        request.uri = uri + "index.html";
    }

    // Return to CloudFront
    return callback(null, request);
};

In terms of implementation there are two limitations:

Not everyone wants Transfer Acceleration (due to extra transfer costs and extra costs for Lambda@Edge rewrites)
Transfer Acceleration is not supported in every zone

Transfer Acceleration is currently not supported for buckets located in the following Regions: Africa (Cape Town) (af-south-1) Asia Pacific (Hong Kong) (ap-east-1) Asia Pacific (Osaka) (ap-northeast-3) Europe (Stockholm) (eu-north-1) Europe (Milan) (eu-south-1) Middle East (Bahrain) (me-south-1)

This means the module needs to:

In variables.tf, provide a lambda_transfer_acceleration variable which is a boolean, that defaults to false. This switches between current (normal) behaviour and the "Transfer Acceleration with Lambda@Edge" behaviour. If the selected zone is not supported, return an error without applying.
In the code, we need to create the new set of resources so that when var.lambda_transfer_acceleration is true, it uses count = to enable those resources necessary for Transfer Acceleration.
In output.tf, if Transfer Acceleration is turned on, we need to provide the Transfer Accelerated endpoint bucketname.s3-accelerate.amazonaws.com.

@phuonghuynh can you help implement this? Thanks.

ronaldtse commented 3 years ago

@phuonghuynh you can use this repo/website to test, you have full access: https://github.com/fontist/fontist.org => https://www.fontist.org

phuonghuynh commented 3 years ago

@ronaldtse ATM, we already implemented lambda-edge-enabled flag for request & response event. I think new feature for loop in Terraform v12 can dynamic add lambda-arn section.

phuonghuynh commented 3 years ago

Upgrading test site and this module to v14 before using new syntax.

strogonoff commented 3 years ago

Testing acceleration effects

I ran tests with S3 uploads to acceleration-enabled and normal S3 buckets, but no strong results yet. It’s unclear whether I’m missing something.

My setup:

— Uploading from EC2 instance in Seoul — To two S3 buckets in Tokyo, consecutively — Control bucket has all default settings — Bucket with acceleration has transfer acceleration enabled in properties

Command for control bucket:

time aws s3 sync iev-data s3://iev-upload-test-noaccel --region=ap-northeast-1 --delete

Command for accelerated bucket:

aws configure set default.s3.use_accelerate_endpoint true
aws configure set s3.addressing_style virtual
time aws s3 sync iev-data s3://iev-upload-test-accel --region=ap-northeast-1 --delete --endpoint-url http://s3-accelerate.amazonaws.com

With acceleration iev-data (lots of small files, 85 MB) takes 3m0.778s, without it 3m35.276s.

strogonoff commented 3 years ago

Speeding up uploads

Zip extraction suggested by Ronald seems promising. I strongly suspect that transfer acceleration will have stronger effect with few large files as opposed to thousands of small ones.
Amazon recommends parallelising uploads: running multiple instances of aws s3 sync. This would require a control utility that can monitor and control status of subprocesses. Options: supervisor or something simpler with polling process status, waiting until all succeed and failing everything if one fails.

phuonghuynh commented 3 years ago

@ronaldtse so we need new GH repository for new lambda? like https://github.com/riboseinc/terraform-aws-lambda-edge-authentication ?

I updated lambda-block in aws_cloudfront_distribution to be dynamic generated (v12 syntax, we will go to v14 later). So we can setup list of lambda@edge for event like this:

  lambda_edges = [
    {
      event_type = "viewer-request"
      lambda_arn = "${module.test-lambda.arn}:${module.test-lambda.version}"
    },
    {
      event_type = "viewer-response"
      lambda_arn = "${module.test-lambda.arn}:${module.test-lambda.version}"
    }
  ]

Still find a way to fix this error

Error: Error putting S3 acceleration: InvalidRequest: S3 Transfer Acceleration is not supported for buckets with periods (.) in their names
    status code: 400, request id: 7J9EXM4P325ARH17, host id: JMYbAFgs9umGubrZ/8ZKpD48vMmBjGHIBoBJBHtauoFDDw0716XXcSgVTdo42E8i1uK2vfqqnoI=

ronaldtse commented 3 years ago

@strogonoff uploading from within the AWS infrastructure will not see great speed improvements because the uploads are already using the AWS internal network. The difference will be larger when the uploads are done “outside” the AWS network, which is the case with GHA.

Still this represents a speed up that is respectable.

ronaldtse commented 3 years ago

@phuonghuynh for this error:

Error: Error putting S3 acceleration: InvalidRequest: S3 Transfer Acceleration is not supported for buckets with periods (.) in their names

You have to create a new S3 bucket to test. For example, for fontist.org, use fontist-org as the S3 bucket name.

we need new GH repository for new lambda? like https://github.com/riboseinc/terraform-aws-lambda-edge-authentication ?

Yes please, the task is now located here: https://github.com/riboseinc/terraform-aws-lambda-redirect-http-index/issues/1

strogonoff commented 3 years ago

@strogonoff uploading from within the AWS infrastructure will not see great speed improvements because the uploads are already using the AWS internal network. The difference will be larger when the uploads are done “outside” the AWS network, which is the case with GHA.

Still this represents a speed up that is respectable.

actually, I think between Github actions on Azure and AWS (both in datacenters nearby each other in the US) there is less distance compared to EC2 in different countries as in my test (Seoul, Korea and Tokyo, Japan). Geography is geography. It might be representable enough. I think the lack of strong results is because transfer acceleration doesn't have a major effect on thousands of small files.

phuonghuynh commented 3 years ago

v1.0.4 has been tagged for this issue.

phuonghuynh commented 3 years ago

@ronaldtse confirmed this mode working from chat

Yes, already using it

riboseinc / terraform-aws-s3-cloudfront-website

Make S3 buckets support Transfer Acceleration #29

Testing acceleration effects

Speeding up uploads