hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Create a resource for hosting static assets (cf. CDN via S3 or CloudFront) #195

Closed MikeTheCanuck closed 6 years ago

MikeTheCanuck commented 6 years ago

At least one of the 2018 API projects - Housing Affordability - was faced with hosting all their raster files in the database, and mercifully instead found a way to retrieve those static assets from an S3 bucket instead.

The only bucket they had available to them at the time was the hacko-data-archive bucket, and the files are being stored alongside the archive data and database backups for all projects: https://s3.console.aws.amazon.com/s3/buckets/hacko-data-archive/2018-housing-affordability/data/

Data access in the API is performed using the boto project for accessing these files via S3:// URLs (rather than via S3 website hosting, which would be more straightforward). For example, the permit_data.run() method here, but also in taxlot_data, policy_inventory, urbaninstitute_rentalcrisis, hud_homelessness and jchs_data_2017.

The primary challenge here is that by mixing "active API content" with "archived data", we will be unable to apply a simple-to-manage data archival Policy to the entire bucket. Since storing the same amount of data in two separate buckets costs no more, it stands to reason to migrate the content to a separate bucket dedicated to hosting static content - or even consider using a CDN like CloudFront.

Options

  1. Leave the content in the hacko-data-archive S3 bucket, apply a policy that never archives that content, and leave the project using the "boto" S3 SDK for access.
  2. Create a new S3 bucket, publish the assets via S3 bucket website hosting, enforce a policy that never archives the content, and convert the Housing Affordability API to access these resources via http:// URLs instead
  3. Use a service such as Amazon CloudFront to publish this content in the Content Delivery Network approach.

Questions

  1. Are we comfortable with multi-layered content-retention policy on a single bucket?
  2. Are the network bandwidth download costs from S3 any cheaper than from ECS/EC2 (where the Swagger static files are stored)? If yes, is there a way for us to re-engineer the Django containers to use Swagger (OpenAPI) static files from an external location such as S3-based hosting?
  3. Is CloudFront measurably more expensive for storage and/or network bandwidth charges than S3 website hosting?
MikeTheCanuck commented 6 years ago

I've