Closed Krinkle closed 11 months ago
- DNS: We prefer CNAME flattening to reduce lookups. Okay?
- DNS We generlaly prefer 24h TTL to reduce lookups (shorter during switchover). Okay?
Confirmed with Fastly Support. These are fine.
DNS: Figure out the correct entrypoint that satisfies out TLS and Networking preferences.
@supertassu and I read through these pages:
We settled on dualstack.t.sni.global.fastly.net
to start the first deployment stages.
After experimenting with ignoring query strings via VCL-like header configuration (https://docs.fastly.com/en/guides/making-query-strings-agnostic), cache objects seem to get mixed up between compressed and uncompressed responses.
$ curl https:/releases.jquery.com/qunit/?42a --connect-to ::dualstack.t.sni.global.fastly.net -I --compressed
HTTP/2 200
…
content-encoding: gzip
accept-ranges: bytesage: 0
x-served-by: cache-lhr7351-LHR
x-cache: MISS
$ curl https:/releases.jquery.com/qunit/?42a --connect-to ::dualstack.t.sni.global.fastly.net -i
HTTP/2 200
…
content-encoding: gzip
accept-ranges: bytes
age: 41
x-served-by: cache-lhr7345-LHR
x-cache: HIT
x-cache-hits: 1
Warning: Binary output can mess up your terminal.
Fastly Support helped us realize that this was actually an issue on our end due to the origin server for releases.jquery.com not sending Vary: Accept-Encoding
.
@supertassu traced this down to a mistake on the new WordPress servers. We forgot to set gzip_vary on
in the nginx config. Oddly enough, Debian defaults to:
gzip on;
gzip_comp_level 6;
But leaves gzip_vary
unset, which defaults to off
per http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_vary. That seems like a bug in the Debian nginx
package. Something we should look at upstreaming.
The old WordPress servers did the same in the private repo at https://github.com/jquery/infrastructure, but we missed it during the conversion.
We caught this before switching DNS, and codeorigin was not affected either way as it already sets the vary
header correctly.
Service: Ignore URL query parameters for caching, to reduce origin load.
This can be done via custom VCL per https://developer.fastly.com/reference/vcl/variables/client-request/req-url-path/. But, an easier way is in the GUI under "Headers". This is slightly confusing as it's not actually a header, but you can use it to configure VCL expressions with the rest done automatically.
Documented at https://docs.fastly.com/en/guides/making-query-strings-agnostic.
Worked great.
Service: Treat URLs as case-insensitive
Similarly, done through another "Header" using the std.lower()
expression per https://developer.fastly.com/reference/vcl/functions/strings/std-tolower/.
Ignore query strings (Request / Set)
url
req.url.path
Case-insensitive URLs (Request / Set)
std.tolower(req.url)
In the first rounds of testing we bumped against a connectivity problem that looks like it may have to do with how the TLS configuration at Fastly. Here is what we knew:
Our base expectation is for HTTPS support to starts at IE9-11 on Windows 7. Ref https://github.com/jquery/infrastructure-puppet/issues/21.
We don't expect IE8 or Windows XP to work, since we already moved to TLS 1.2 at some point during the StackPath era.
Via BrowserStack, in Windows 8 and IE 11, I can load these URLs without issue:
They also work fine in IE 11, IE 10, and IE 9 on Windows 7.
Using the same Win8/IE11 browser, https://codeorigin.jquery.com/mobile/1.4.0/images/icons-png/eye-black.png consistently fails with a connection error. It also fails in IE 11, IE 10, and IE 9 on Win 7. Note "codeorigin" vs "code", where codeorigin uses our new Fastly deployment.
Fastly Support responded:
[…] Fastly provides the following TLS cipher suite.
sslscan codeorigin.jquery.com Preferred TLSv1.2 128 bits ECDHE-RSA-AES128-GCM-SHA256 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-GCM-SHA384 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-CHACHA20-POLY1305 Curve 25519 DHE 253 OpenSSL name -> IANA name ECDHE-RSA-AES128-GCM-SHA256 -> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 ECDHE-RSA-AES256-GCM-SHA384 -> TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 ECDHE-RSA-CHACHA20-POLY1305 -> TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
Please see below URL. https://learn.microsoft.com/en-us/windows/win32/secauthn/tls-cipher-suites-in-windows-7
Windows 7 does not support those cipher suites and the connection will fail.
For compatibility, we have a TLS configuration (CNAME
k.sni.global.fastly.net
) with cipher suites in CBC mode.% sslscan k.sni.global.fastly.net Preferred TLSv1.2 128 bits ECDHE-RSA-AES128-GCM-SHA256 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-GCM-SHA384 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-CHACHA20-POLY1305 Curve 25519 DHE 253 Accepted TLSv1.2 128 bits ECDHE-RSA-AES128-SHA256 Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-SHA384 Curve 25519 DHE 253 Accepted TLSv1.2 128 bits ECDHE-RSA-AES128-SHA Curve 25519 DHE 253 Accepted TLSv1.2 256 bits ECDHE-RSA-AES256-SHA Curve 25519 DHE 253 OpenSSL name -> IANA name ECDHE-RSA-AES128-SHA256 -> TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 ECDHE-RSA-AES256-SHA384 -> TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 ECDHE-RSA-AES128-SHA -> TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA ECDHE-RSA-AES256-SHA -> TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
[…]
Switching our experimental deployment from t.sni to k.sni (dualstack.k.sni.global.fastly.net
) resolved the issue. The new deployment now also works in IE11 on Windows 8 and IE 9 on Windows 7.
Traffic levels at Fastly over the past week, prior to the big switch. This represents the "codeorigin.jquery.com" canary experiment at about 4 requests per second every second (or 11-22K requests per hour), with an increase on 9 September when we updated our first-party documentation sites (jquery.com, api.jquery.com) to load jquery-3.7.1.min.js
from the canary deployment at codeorigin.jquery.com instead of the canonical code.jquery.com.
We built most of our confidence with the canary traffic over the two weeks, using https://releases.jquery.com and codeorigin.jquery.com. Prior to the big switch we also made sure the code.jquery.com
DNS entry was set to highest DNS TTL that Cloudflare supports (1 day, 24 hours), so that to big switch would go as slowly as possible within the limitation of Cloudflare's DNS system (no geographic variance unfortunately). DNS tends to roll over pretty quickly for 99% of traffic, so in practice this doesn't make much difference, but it's something.
The big switch took place on Friday 15 Sept around 17:55 UTC. Over the two weeks prior we were doing around 4 req/s (180/min, or 0.1% of jQuery CDN traffic). Within the first five minutes this went up to 16,000 requests per second (1M per minute, or about half our of our traffic):
Over the course of the next hour we recieved around 90% of our normal traffic, and within a day 99% of the 22K-30K requests per second we normally do.
From the StackPath side, we started around 27,000 requests per second (HTTPS+HTTP) on Friday 15 Sep 2023 at 14:30 UTC, a few hours before the switch.
Draining down to about 300 req/s by 21:00 UTC:
Today, we are still serving about 100 requests per second every second through the StackPath service. It's been two full days since the switch, and it's 24 hours after the old DNS entry should have expired from DNS resolves by Internet service providers, device operating systems, and web browsers. While this is proportionally small (<1%), it's still more than 20X the "experimental" amount of traffic we received on Fastly during the two weeks prior to the switch. Hopefully this will drain within another week or so.
Continuing to slowly drain from 100 rps on 17 Sept 2023 to about 60 rps today (HTTPS: 10 rps, HTTP: 50 rps).
Still totalling about 50 million requests between 18 Sept and 25 Sept.
Breakdown:
It turns out, major Internet infrastructure doesn't "just" shut down, does it?
The Highwinds StrikeTracker portal is still up six months later, and there's even some decent traffic still coming through.
Seems to not want to go below 40 rps. These could be health checks for all I know, although it seems a bit much.
General
*.jquery.com
certificate./jQuery-foo.js
is able to match/jquery-foo.js
.Testing
-4
,-6
,--http1.1
,--http2
,--tls-max 1.2
,--tls-max 1.3
, http+https URLs (except http2 over HTTP) and confirm HTTP 200 OK (esp no redirect). Use--connect-to ::SOMETHING.global.fastly.net
to test prior to deploying any DNS changes.Deployment
Three services overall: code, content, releases.
codeorigin.jquery.com
for functional testing.code.jquery.com
.Examples of past issues:
OpenSSL SSL_read: Connection reset by peer
. https://github.com/jquery/codeorigin.jquery.com/issues/82Post-deployment