At least one of the 2018 API projects - Housing Affordability - was faced with hosting all their raster files in the database, and mercifully instead found a way to retrieve those static assets from an S3 bucket instead.
Data access in the API is performed using the boto project for accessing these files via S3:// URLs (rather than via S3 website hosting, which would be more straightforward). For example, the permit_data.run() method here, but also in taxlot_data, policy_inventory, urbaninstitute_rentalcrisis, hud_homelessness and jchs_data_2017.
The primary challenge here is that by mixing "active API content" with "archived data", we will be unable to apply a simple-to-manage data archival Policy to the entire bucket. Since storing the same amount of data in two separate buckets costs no more, it stands to reason to migrate the content to a separate bucket dedicated to hosting static content - or even consider using a CDN like CloudFront.
Options
Leave the content in the hacko-data-archive S3 bucket, apply a policy that never archives that content, and leave the project using the "boto" S3 SDK for access.
Create a new S3 bucket, publish the assets via S3 bucket website hosting, enforce a policy that never archives the content, and convert the Housing Affordability API to access these resources via http:// URLs instead
Use a service such as Amazon CloudFront to publish this content in the Content Delivery Network approach.
Questions
Are we comfortable with multi-layered content-retention policy on a single bucket?
Are the network bandwidth download costs from S3 any cheaper than from ECS/EC2 (where the Swagger static files are stored)? If yes, is there a way for us to re-engineer the Django containers to use Swagger (OpenAPI) static files from an external location such as S3-based hosting?
Is CloudFront measurably more expensive for storage and/or network bandwidth charges than S3 website hosting?
At least one of the 2018 API projects - Housing Affordability - was faced with hosting all their raster files in the database, and mercifully instead found a way to retrieve those static assets from an S3 bucket instead.
The only bucket they had available to them at the time was the
hacko-data-archive
bucket, and the files are being stored alongside the archive data and database backups for all projects: https://s3.console.aws.amazon.com/s3/buckets/hacko-data-archive/2018-housing-affordability/data/Data access in the API is performed using the boto project for accessing these files via S3:// URLs (rather than via S3 website hosting, which would be more straightforward). For example, the
permit_data.run()
method here, but also intaxlot_data
,policy_inventory
,urbaninstitute_rentalcrisis
,hud_homelessness
andjchs_data_2017
.The primary challenge here is that by mixing "active API content" with "archived data", we will be unable to apply a simple-to-manage data archival Policy to the entire bucket. Since storing the same amount of data in two separate buckets costs no more, it stands to reason to migrate the content to a separate bucket dedicated to hosting static content - or even consider using a CDN like CloudFront.
Options
hacko-data-archive
S3 bucket, apply a policy that never archives that content, and leave the project using the "boto" S3 SDK for access.Questions