openlawlibrary / SAWS

Static Aws WebSites
Apache License 2.0
0 stars 0 forks source link

Static hosting Thoughts #1

Open dgreisen opened 6 years ago

dgreisen commented 6 years ago

Our shop uses Python, Javascript and C#. So the tooling will need to be in one of those languages (in that order of preference) so we can maintain it.

Example HTML repository: https://github.com/DCCouncil/dc-law-html/ with submodule https://github.com/DCCouncil/dc-law-docs-laws; may eventually eventually want to pull css/ and js/ from a third repository.

Currently, dc-law-html consists of 800mb of html in ~30k files and. Most file changes every time the the repository is updated (due to recency information updating on ~20k of those files. We are considering ways to reduce the number of files that change to several hundred each time the repository is updated.

There are a bunch of redirects in https://github.com/DCCouncil/dc-law-html/blob/master/redirects.json. There are a bunch of bulk elasticsearch index updates in https://github.com/DCCouncil/dc-law-html/blob/master/index.bulk. There are a bunch of programatic rewrites that need to happen, replacing ~ with : (see, e.g. https://github.com/DCCouncil/dc-law-html/blob/master/dc/council/code/sections/28~1-204.html which becomes https://code.dccouncil.us/dc/council/code/sections/28:1-204.html)

One possibility is to copy all files into a new s3 directory and update the origin at build time. Example:

AWS Cloudfront has two origins one corresponding to each repository, the origin points to directory with the current commit inside the the corresponding repo. Cloudfront terminates ssl. aws bucket has the following directory structure:

openlawlibrary/dc-law-html
  /71e2c192ed7213b6ededb6f2f268d0b20ccabba5
    /...
  /e038e33f96442738782f099e60bfa4e056585aec
    /...
  /...
openlawlibrary/dc-law-docs-laws
  /df0ddb837ad84c775ec1ab9988f45a0c7e23efe0
    /...
  /219704d51bb7f036b2d25fd38cad9ad62f081795
    /...
  /...

Every time a build occurs, the origins for the repositories are updated to point to the current hashes.

AWS Cloudfront calls a lambda function every time it has to hit the origin (not every time it gets a request). The lambda function does the following:

  1. rewrite all :s to ~s
  2. rewrite urls:
    • /dc/council/laws/docs/{remainder} to /dc-law-docs-laws/{current hash}/{remainder}
      • (remove /dc/council/laws/docs/ prefix, add /dc-law-docs-laws/{current hash}/ prefix)
      • get current hash prefix from an environment variable DC-LAW-DOCS-LAWS
    • /{remainder} to /dc-law-html/{current hash}/{remainder}
      • (add /dc-law-docs-laws/{current hash}/ prefix)
      • get current hash prefix from an environment variable DC-LAW-HTML

When a new commit is pushed to dc-law-html HEAD:

  1. upload dc-law-html HEAD to openlawlibrary/dc-law-html/{hash}; rewriting urls as we upload
  2. add s3 redirect objects for all paths listed in redirects.json
  3. Update the cloudformation or terraform template to point to the new hash, and run cloudformation/terraform.
  4. update elasticsearch
  5. invalidate cache

We should be able to set an environment to track HEAD, a tag or to update to any arbitrary commit. Notes:

dgreisen commented 6 years ago

Above is relatively fleshed out. Comments include more incoherent thoughts and research.