Most of the above was addressed by simply using the same Nginx config as we have in the private infrastructure.git repo for provisioning legacy codeorigin (copied from wordpress-header.conf.erb and wp/jquery.pp, respectively).
However, the last two points (charset, and unstable etag), require additional changes.
Nginx: charset
The default charset from Nginx differs between Debian and Alpine, possibly due to changes in some of the shared libraries. Fix this by explicitly setting charset utf-8. Note that this does (and should) only apply to files with a MIME type in charset_types list, which by default contains HTML, JS, and CSS.
More importantly, it does not (and should not) cause PNG files and other binary assets to be served with a charset, which would be wrong.
Nginx: E-Tag and Last-Modified
This was a tricky one. Git does not store file modification timestamps, which means the on-disk file mtimes are somewhat arbitrarily set to when the file was created by the local Git client. For most files, this means the time you cloned the repository, and for files added later, the time you ran git pull.
This was de-facto "stable" on the legacy server because it is a persistent server with a persistent git directory (we always use the same clone and just update it to fetch new files after a commit happens). The timestamps themselves don't matter
as long as they remain constant for any given file, and they were.
This is important because E-Tag is computed in Nginx based on file mtime and size and (unlike Apache) this behaviour is not configurable in Nginx (e.g. to make it compute a content hash instead). And Last-Modified naturally uses the
mtime as well.
As a workaround, I've added a touch command to assign all CDN files a fixed timestamp far in the past. This is safe for us because all files are expected to remain static. Even if we would change a file, it wouldn't propagate to the CDN without a manual CDN API purge (given the all-important high values for Cache-Control max-age and Expires), and after that would not propagate to any browser that has previously fetched it unless the user clears their own cache (given the all-important Cache-Control telling the browser to re-use the file blindly, which is critical for performance). As such, whether the timestamp remains constant or changes after the file is changed, would make no difference for how it propagates.
New state
After, with this patch (and verified by a simple test suite).
dockerize$ curl -i 'http://localhost:4000/jquery-3.0.0.js' | head -n20
Server: nginx
Date: Mon, 16 Aug 2021 23:49:43 GMT
Connection: keep-alive
Access-Control-Allow-Origin: *
Cache-Control: max-age=315360000
Cache-Control: public
Content-Length: 263268
Content-Type: application/javascript; charset=utf-8
ETag: "28feccc0-40464"
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Last-Modified: Fri, 18 Oct 1991 12:00:00 GMT
Vary: Accept-Encoding
Accept-Ranges: bytes
Misc changes
Remove left-over Nginx config for purge.php.
Remove left-over php-fpm script.
Use our own document root at /var/www/cdn instead of mixing our files into the default /usr/share/nginx/html/ which contains files from the "docker-nginx" project that we may not want to serve blindly.
This mainly affects the odd index.html file that could otherwise be served.
Pin the Nginx base image version at 1.x. This is mostly a no-op since they've been on 1.x for 10 years, but if it does rollover
we wouldn't want an unmanaged upgrade to a new major version. We could pun this more narrowly at some point, but 1.x should be fine for now.
Invert the config file (let Dockerfile remove code instead of comment out) so that the most critical lines can be easily
linted.
Ensure all files listed are redirected correctly, as per https://github.com/jquery/infrastructure/issues/554 and verify this with an integration test. The expected values are populated based on running curl against localhost from a shell on the legacy codeorigin. I specifically did not compare against live code.jquery.com, since we care about matching what we feed to the CDN (although the above problems were surfacing through there as well). Some of the CDN behaviours can't and don't need to be mimicked from the origin. Prior to launch we will (also) compare the live code.jquery.com against the new staging CDN endpoint, but that's separate from the test for the container.
TLDR:
Differences in file headers
In summary:
Status quo:
Previously, without this patch:
Match existing Nginx config
Most of the above was addressed by simply using the same Nginx config as we have in the private infrastructure.git repo for provisioning legacy codeorigin (copied from wordpress-header.conf.erb and wp/jquery.pp, respectively).
However, the last two points (charset, and unstable etag), require additional changes.
Nginx: charset
The default charset from Nginx differs between Debian and Alpine, possibly due to changes in some of the shared libraries. Fix this by explicitly setting
charset utf-8
. Note that this does (and should) only apply to files with a MIME type incharset_types
list, which by default contains HTML, JS, and CSS.More importantly, it does not (and should not) cause PNG files and other binary assets to be served with a charset, which would be wrong.
Nginx: E-Tag and Last-Modified
This was a tricky one. Git does not store file modification timestamps, which means the on-disk file mtimes are somewhat arbitrarily set to when the file was created by the local Git client. For most files, this means the time you cloned the repository, and for files added later, the time you ran
git pull
.This was de-facto "stable" on the legacy server because it is a persistent server with a persistent git directory (we always use the same clone and just update it to fetch new files after a commit happens). The timestamps themselves don't matter as long as they remain constant for any given file, and they were.
This is important because E-Tag is computed in Nginx based on file mtime and size and (unlike Apache) this behaviour is not configurable in Nginx (e.g. to make it compute a content hash instead). And Last-Modified naturally uses the mtime as well.
As a workaround, I've added a
touch
command to assign all CDN files a fixed timestamp far in the past. This is safe for us because all files are expected to remain static. Even if we would change a file, it wouldn't propagate to the CDN without a manual CDN API purge (given the all-important high values for Cache-Control max-age and Expires), and after that would not propagate to any browser that has previously fetched it unless the user clears their own cache (given the all-important Cache-Control telling the browser to re-use the file blindly, which is critical for performance). As such, whether the timestamp remains constant or changes after the file is changed, would make no difference for how it propagates.New state
After, with this patch (and verified by a simple test suite).
Misc changes
Remove left-over Nginx config for purge.php.
Remove left-over php-fpm script.
Use our own document root at /var/www/cdn instead of mixing our files into the default /usr/share/nginx/html/ which contains files from the "docker-nginx" project that we may not want to serve blindly. This mainly affects the odd index.html file that could otherwise be served.
Pin the Nginx base image version at 1.x. This is mostly a no-op since they've been on 1.x for 10 years, but if it does rollover we wouldn't want an unmanaged upgrade to a new major version. We could pun this more narrowly at some point, but 1.x should be fine for now.
Invert the config file (let Dockerfile remove code instead of comment out) so that the most critical lines can be easily linted.
Ensure all files listed are redirected correctly, as per https://github.com/jquery/infrastructure/issues/554 and verify this with an integration test. The expected values are populated based on running
curl
against localhost from a shell on the legacy codeorigin. I specifically did not compare against live code.jquery.com, since we care about matching what we feed to the CDN (although the above problems were surfacing through there as well). Some of the CDN behaviours can't and don't need to be mimicked from the origin. Prior to launch we will (also) compare the live code.jquery.com against the new staging CDN endpoint, but that's separate from the test for the container.Ref https://github.com/jquery/infrastructure/issues/554.