nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
88 stars 49 forks source link

Understand nextstrain.org caching #832

Closed jameshadfield closed 5 months ago

jameshadfield commented 5 months ago

As part of writing up this issue I fixed it, but it's still really weird to me, and I'd like to understand why this happened.


Description of bug

As part of #811 I uploaded new datasets to our staging bucket, by downloading then uploading the flu/seasonal/h3n2/ha/2y JSONs and renaming them to seasonal-flu, so we could update our tests to use the latter. I initially uploaded them using aws s3 cp (so the data wasn't zipped), and then re-uploaded them using nextstrain remote upload ... (which magically zips the data). The strangeness is to do with the associated content-encoding metadata and/or the actual encoding of the file. The file is gzipped (as it's 1.1MB, and uncompressed it's 37MB), and the content encoding is gzip (can see this in AWS console).

Using a small script to access this file (cf. the staging/zika dataset) via various APIs gives me weird results.

Test script Run the server locally at the default port 5000 and then run the following script: ```js import fetch from 'node-fetch'; async function main() { const paths = [ "staging/seasonal-flu/h3n2/ha/2y", "staging/zika", ] /* Make a GET request to our REST API asking for JSON data */ console.log("\nREST API") for (const path of paths) { const mediaType = "application/json" const url = `http://localhost:5000/${path}` console.log("\tfetch:", url, mediaType) const req = fetch(url, accept(mediaType)); const res = await req; console.log("\t\tresponse content encoding:", res.headers.get('content-encoding')) /* print out the start of the response. Note that this is after decompression, if any, and I don't know how to truly know if decompression is happening. If content-encoding is gzip then does this mean the response has for sure been decompressed? */ const text = await res.text(); console.log("\t\tResponse:", text.slice(0,50)); } /* now do the same but hitting the Charon endpoint */ console.log("\nCharon API") for (const path of paths) { const url = `http://localhost:5000/charon/getDataset?prefix=${path}` console.log("\tfetch:", url) const req = fetch(url) const res = await req; console.log("\t\tresponse content encoding:", res.headers.get('content-encoding')) const text = await res.text(); console.log("\t\tResponse:", text.slice(0,50)); } /* for good measure hit the HTTPS S3 endpoints */ console.log("\nS3 HTTPS ") for (const path of paths) { const key = path.replace(/^staging\//, "").replace(/\//g, '_') + ".json" const url = `https://nextstrain-staging.s3.amazonaws.com/${key}`; console.log("\tfetch:", url) const req = fetch(url) const res = await req; console.log("\t\tresponse content encoding:", res.headers.get('content-encoding')) const text = await res.text(); console.log("\t\tResponse:", text.slice(0,50).replace(/\n/g, "")) } } function accept(mediaType) { return { headers: { Accept: mediaType } }; } main(); ```

When I run this script locally I get:

REST API
        fetch: http://localhost:5000/staging/seasonal-flu/h3n2/ha/2y application/json
                response content encoding: gzip
                Response: ��kw�.�~߿K���:"����M�i�mY-����=�� �D
        fetch: http://localhost:5000/staging/zika application/json
                response content encoding: gzip
                Response: {"version":"v2","meta":{"title":"Real-time trackin

Charon API
        fetch: http://localhost:5000/charon/getDataset?prefix=staging/seasonal-flu/h3n2/ha/2y
                response content encoding: null
                Response: ��kw�.�~߿K���:"����M�i�mY-����=�� �D
        fetch: http://localhost:5000/charon/getDataset?prefix=staging/zika
                response content encoding: gzip
                Response: {"version":"v2","meta":{"title":"Real-time trackin

S3 HTTPS 
        fetch: https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json
                response content encoding: gzip
                Response: {<NEWLINE>  "version": "v2",<NEWLINE>  "meta": {<NEWLINE>    "title": "Rea
        fetch: https://nextstrain-staging.s3.amazonaws.com/zika.json
                response content encoding: gzip
                Response: {"version":"v2","meta":{"title":"Real-time trackin

The weirdness (for me) is:

  1. For the REST API, the (seasonal-flu) data is gzipped, i.e. it's not being decompressed despite the response content encoding appearing to be correct.
  2. Very similar request/checking happens in requests.test.js and this test passes on GitHub actions but not locally. This is what led me down this rabbit hole, and hinted at a caching issue.
  3. Using the S3 HTTPS endpoint for the same file, the content encoding is the same but the data is correctly decompressed by the same fetch call in the test script. (This address is used by our server to fetch the data behind the rest/charon APIs.)
  4. When using the charon API call, the content-encoding ("gzip") header seems to have been stripped. If this has been stripped then the returned data (from charon) being compressed seems explainable. But why is it stripped for charon and not the restful API?

I have a mad hunch that the server's reported content-encoding: gzip for the REST API response is somehow corrupted, and the script's fetch call is not respecting it, but I don't understand the mechanics well enough.


The fix

The following patch, which creates a new (empty) cache for the server's fetch machinery, fixed it (response content encoding: gzip for all responses, and correct JSON contents):

diff --git a/src/fetch.js b/src/fetch.js
index 65f88ff5..d12fc78e 100644
--- a/src/fetch.js
+++ b/src/fetch.js
@@ -5,7 +5,7 @@ const FETCH_OPTIONS = Symbol("Request options for make-fetch-happen");

 const _fetch = __fetch.defaults({
   cache: "default",
-  cacheManager: process.env.FETCH_CACHE || "/tmp/fetch-cache",
+  cacheManager: process.env.FETCH_CACHE || "/tmp/fetch-cache-mk2",
 });

Note: keeping the old cache but changing to cache: "no-store", or cache: "no-cache", or cache: "reload" did not fix the bug.


Questions

  1. Why was the cache we are using not detecting that the cached data was now invalid? We set cache: no-cache on the request so the response must be validated with the origin server before each reuse, but it seems we didn't pick up that the contents/metadata had changed. (I mean, is this what happened? Clearly the caching used by fetch (make-fetch-happen) is at fault here somehow.)

  2. Why was the content-encoding different on the charon (null) vs RESTful API (gzip) response, and why did this seemingly not affect decompression (or lack thereof) of the response?


P.S. Looks like MacOS will remove /tmp files after 3 days, so there were a few different things that lined up in order for me to crash into this bug.

tsibley commented 5 months ago

This is certainly odd behaviour. Off the bat, I can't say with certainty what the cause might be. It's certainly able to be understood, but it's the kind of situation where being able to reproduce and poke at the behaviour is immensely useful to gaining that understanding. It would also be useful to have seen the server's logs resulting from running your test script, as they'll show caching information.

I initially uploaded them using aws s3 cp (so the data wasn't zipped), and then re-uploaded them using nextstrain remote upload ... (which magically zips the data).

Between these two uploads, did you access the dataset via the nextstrain.org server?

tsibley commented 5 months ago

In any case, unless you preserved the full context in which you saw the behaviour or we can reproduce it (I can't off hand), then I suspect it's not worth digging into more as we'll be missing access to important information.

jameshadfield commented 5 months ago

Between these two uploads, did you access the dataset via the nextstrain.org server?

Yes (although maybe only for one of the APIs, see below). That's how the cache became invalid, although it should have been detected as so during revalidation by one of the many layers involved here (make-fetch-happen, minipass-fetch, HTTP cache etc).

It would also be useful to have seen the server's logs resulting from running your test script, as they'll show caching information.

Server logs (while running the test script whose output is shown above, and some line breaks added for clarity)

[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache revalidated, timestamp Fri, 19 Apr 2024 04:53:46 GMT)

[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/zika.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/zika.json (cache revalidated, timestamp Fri, 19 Apr 2024 04:53:46 GMT)

Getting (nextstrain) datasets for: prefix=staging/seasonal-flu/h3n2/ha/2y
[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache revalidated, timestamp Sun, 21 Apr 2024 21:01:17 GMT)

Getting (nextstrain) datasets for: prefix=staging/zika
[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/zika.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/zika.json (cache revalidated, timestamp Sun, 21 Apr 2024 21:01:17 GMT)

~This highlights that the (server-side) fetch for ...seasonal-flu_h3n2_ha_2y.json is subtly different if it comes through the RESTful API vs charon, as indicated by different revalidation times. This probably matches up with the different content-encodings passed through to the client (script). This points to a difference in how we are handling the routes, and it'd be good to remove this difference.~ See more below.

unless you preserved the full context in which you saw the behaviour or we can reproduce it (I can't off hand), then I suspect it's not worth digging into more as we'll be missing access to important information.

We may decide it's not worth digging into this one, but it's important for the wider team to realise that the server has caching bugs in it. We've talked about server logging improvements, and this would be a good example where debug-level logging (to a file?) would have been useful; we could use this in tests to better debug the many CI failures we are getting.

Node.js v18 (we were using v16 until a few weeks ago) has a browser-compatible fetch implementation so switching to that would remove some dependencies and reduce the number of layers involved, however I can't find a good doc about the cache it uses at present.

jameshadfield commented 5 months ago

A little more digging and logging of the upstream response (S3 + cache) in the server:

Restful API:

proxyFromUpstream
        request url: https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json
        request accept-encoding: gzip
[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache revalidated, timestamp Fri, 19 Apr 2024 04:53:46 GMT)
proxyResponseBodyFromUpstream
(upstream response) headers:
        Content-Encoding null                      **** zika shows "gzip" here *****
        Content-Type application/json
        etag "a6fb19653970598e7cb514f01122cf73"
        last-modified Tue, 16 Apr 2024 04:11:22 GMT
        cache key make-fetch-happen%3Arequest-cache%3Ahttps%3A%2F%2Fnextstrain-staging.s3.amazonaws.com%2Fseasonal-flu_h3n2_ha_2y.json
        cache hash sha512-AxkNCcsms9SmJPycZsJaZPX2n7dGH8NCQ5px9t4opKF4eL8CFUmHsCE8%2BrX9VxKrKYje0pigr6IA8FsPNr4TfQ%3D%3D
        cache time Fri, 19 Apr 2024 04:53:46 GMT
(upstream request) compress? false
(_our_ response) Content-Encoding undefined            **** zika shows "gzip" here *****
proxyResponseBodyFromUpstream streaming to client (without decompressing)

Charon API:

Getting (nextstrain) datasets for: prefix=staging/seasonal-flu/h3n2/ha/2y
proxyFromUpstream
        request url: https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json
        request accept-encoding: gzip
[verbose]       [fetch] GET https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache: no-cache)
[verbose]       [fetch] 200 OK https://nextstrain-staging.s3.amazonaws.com/seasonal-flu_h3n2_ha_2y.json (cache revalidated, timestamp Mon, 22 Apr 2024 01:01:47 GMT)
proxyResponseBodyFromUpstream
(upstream response) headers:
        Content-Encoding null                           **** zika shows "gzip" here *****
        Content-Type application/json
        etag "a6fb19653970598e7cb514f01122cf73"
        last-modified Tue, 16 Apr 2024 04:11:22 GMT
        cache key make-fetch-happen%3Arequest-cache%3Ahttps%3A%2F%2Fnextstrain-staging.s3.amazonaws.com%2Fseasonal-flu_h3n2_ha_2y.json
        cache hash sha512-AxkNCcsms9SmJPycZsJaZPX2n7dGH8NCQ5px9t4opKF4eL8CFUmHsCE8%2BrX9VxKrKYje0pigr6IA8FsPNr4TfQ%3D%3D
        cache time Mon, 22 Apr 2024 01:01:47 GMT
(upstream request) compress? false
(_our_ response) Content-Encoding undefined              **** zika shows "gzip" here *****
proxyResponseBodyFromUpstream streaming to client (without decompressing)

Looking at the server, it seems clear that the upstream response headers (S3 via a cache layer) doesn't have content encoding set. When you clear the cache this is set to gzip (as expected) so I think it's pretty clear this is a cache-revalidation bug in one of the fetch layers.

It's probably not worth spending too much more time here, but there is one question I'd like to answer to better understand the system:

From my logging (above) it seems our server does not set the content-encoding header but the test script ("client") shows that this is set (to "gzip") for the RESTful API but not the charon API. I can't explain this difference, and would really like to know where in the chain content-encoding: gzip is being set. @tsibley do you have advice on what to look for here?

P.S. I'm pretty sure the cache times change between requests because the first request updates the cache before the second request arrives, so that's not strange like I originally thought. It happens for zika as well.


Server commands to produce logging shown above ```diff diff --git a/src/upstream.js b/src/upstream.js index 6a892b89..20c4ca98 100644 --- a/src/upstream.js +++ b/src/upstream.js @@ -43,6 +43,11 @@ export async function deleteByUrls(urls) { * @param {String} accept - HTTP Accept header value */ export async function proxyFromUpstream(req, res, url, accept) { + + console.log("proxyFromUpstream") + console.log("\trequest url:", url) + console.log("\trequest accept-encoding:", req.acceptsEncodings("gzip") ? 'gzip' : 'identity') + return await proxyResponseBodyFromUpstream(req, res, new Request(url, { headers: { Accept: accept, @@ -190,7 +195,6 @@ export async function proxyToUpstream(req, res, upstreamUrlExtractor, contentTyp */ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { const upstreamRes = await fetch(upstreamReq); - switch (upstreamRes.status) { case 200: case 304: @@ -238,6 +242,19 @@ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { res.set(copyHeaders(upstreamRes.headers, forwardedUpstreamResHeaders)); + + console.log(`proxyResponseBodyFromUpstream`) + + console.log("(upstream response) headers:") + console.log("\tContent-Encoding", upstreamRes.headers.get('Content-Encoding')) + console.log("\tContent-Type", upstreamRes.headers.get('Content-Type')) + console.log("\tetag", upstreamRes.headers.get('etag')) + console.log("\tlast-modified", upstreamRes.headers.get('last-modified')) + console.log("\tcache key", upstreamRes.headers.get('x-local-cache-key')) + console.log("\tcache hash", upstreamRes.headers.get('x-local-cache-hash')) + console.log("\tcache time", upstreamRes.headers.get('x-local-cache-time')) + console.log("(upstream request) compress?", upstreamReq.compress) + console.log("(_our_ response) Content-Encoding", res.get("Content-Encoding")) /* Allow private (e.g. browser) caches to store this response, but require * them to revalidate it every time before use. They'll make conditional * requests which we can respond to quickly from our own server-side cache. @@ -252,6 +269,7 @@ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { res.removeHeader("Content-Type"); res.removeHeader("Content-Length"); res.removeHeader("Content-Encoding"); + console.log("**** SENDING 304 ****") return res.status(304).end(); } @@ -266,6 +284,7 @@ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { * -trs, 9 Dec 2021 */ if (res.get("Content-Encoding") === "gzip" && !req.acceptsEncodings("gzip")) { + console.log("proxyResponseBodyFromUpstream Unzipping because client can't handle content-encoding") res.removeHeader("Content-Encoding"); res.removeHeader("Content-Length"); @@ -275,6 +294,7 @@ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { await pipeline(upstreamRes.body, zlib.createGunzip(), res); } else { + console.log("proxyResponseBodyFromUpstream streaming to client (without decompressing)") try { await pipeline(upstreamRes.body, res); } catch (err) { @@ -286,6 +306,7 @@ async function proxyResponseBodyFromUpstream(req, res, upstreamReq) { if (err?.code !== 'ERR_STREAM_PREMATURE_CLOSE') throw err; } } + console.log("\n\n\n") return res.end(); } ```
tsibley commented 5 months ago

@jameshadfield Do you have a reproduction for this locally starting from a fresh cache? Because try as I might, I can't reproduce this with a test dataset that I manipulate on S3 between requests to a local server. I see the cache updated (not revalidated) on the first request after switching the dataset in S3 from no compression to compressed.

The Content-Encoding: gzip added after your logging is not a surprise to me. The whole server has the compression middleware enabled:

https://github.com/nextstrain/nextstrain.org/blob/b031d1113f4b6a0747919f33a19df4edb016966e/src/routing/setup.js#L30

You can see its logs by running the server under DEBUG=compression, for example.

Looking back at what you said you did and the logs you experienced…

As part of https://github.com/nextstrain/nextstrain.org/pull/811 I uploaded new datasets to our staging bucket, by downloading then uploading the flu/seasonal/h3n2/ha/2y JSONs and renaming them to seasonal-flu, so we could update our tests to use the latter. I initially uploaded them using aws s3 cp (so the data wasn't zipped), and then re-uploaded them using nextstrain remote upload ... (which magically zips the data).

How did you download the flu/seasonal/h3n2/ha/2y JSONs? If you downloaded them as-is without decompression, then they'd be gzipped on your local disk. And if you then uploaded them back to S3 using aws s3 cp without specifying --content-encoding gzip, then they'd still be gzipped but nothing would know about it (not S3, not our server). And I believe that would lead to the logs you saw.

Node.js v18 (we were using v16 until a few weeks ago) has a browser-compatible fetch implementation so switching to that would remove some dependencies and reduce the number of layers involved, however I can't find a good doc about the cache it uses at present.

I'm often for moving from third-party libraries to stdlibs when possible, but I'm skeptical about this. It's marked "Stability 1 - Experimental" up until v20 LTS, only just stable in v21. The experimental stability level is "may have backwards incompat breakages at any time; do not use in production". make-fetch-happen is widely used as part of npm and has caching behaviour that's pretty well described. (I also put a bunch of work into making its caching more robust re: encodings/compression.) The core fetch() right now strikes me as not fewer layers, just different layers.

tsibley commented 5 months ago

Node's stdlib fetch() appears to be from Undici.

It is also light on cache documentation, but does expose this: https://undici.nodejs.org/#/docs/api/CacheStorage

In general, Undici looks useful, but I'm still not sold on switching. Switching to using its fetch() effectively would basically require using the rest of the library too, which does not appear to be part of the stdlib. So a lot of benefit goes out the window.

tsibley commented 5 months ago

Ok, I'm gonna stop looking at this for now, but what I found re: caching in Undici is not promising. It's fetch() implementation is littered with TODO: cache comments, the open issues are scattered and ongoing, and I'm pretty sure that it's confusing Request.mode for Request.cache in two places in the stubbed implementation.

jameshadfield commented 5 months ago

The Content-Encoding: gzip added after your logging is not a surprise to me. The whole server has the compression middleware enabled.

Thanks! This was the layer I had forgotten about. I will find some time to check if the upstreamResponse body is uncompressed and we're compressing it in the compression middleware or if the body is already gzipped and we're gzipping it again. The script output in the original issue here would point to the latter, especially if the content-encoding is being added to the RESTful API but not Charon, but I'll try to confirm this.

Do you have a reproduction for this locally starting from a fresh cache?

Nope (and I haven't tried to recreate it). I'm sure it's not a common problem we will run into, but it's certainly a bug and it's been a useful exercise for me to learn more about caching.

Thanks for the pointers!

tsibley commented 5 months ago

Caching bugs are certainly possible and in general I wouldn't bet against them, but in this case I don't think there's a bug here in our server.

With some more testing, all signs I can see point to this sequence of events:

  1. gzipped content uploaded to S3 without Content-Encoding
  2. requests made to the local server that populate the cache from S3
  3. subsequently the same gzipped content (byte-for-byte!) re-uploaded to S3 with Content-Encoding
  4. requests made to the local server which revalidate the cache with S3

The last step is a cache revalidation not a cache update because S3 is not changing the ETag when the object's Content-Encoding header changes. In my prior testing, the S3 object's bytes were changing, so the ETag was changing, and I couldn't reproduce what you reported. But once I ensure the bytes don't change, just change the Content-Encoding, I can reproduce.

$ echo '{"hello":"gzip"}' | gzip > gzipped

$ aws s3api put-object --bucket nextstrain-staging --key trs_test.json --body gzipped --content-type application/json 
{
    "ETag": "\"3c099fc7a460e777422ae4445291de38\"",
    "ServerSideEncryption": "AES256"
}

$ curl -si --compressed https://nextstrain-staging.s3.amazonaws.com/trs_test.json | (while read -r line; do echo "$line"; [[ "$line" == $'\r' ]] && exec hexdump -C; done)
HTTP/1.1 200 OK
x-amz-id-2: qzk1gCZiVQtAjWeBe3jQZuDywixkXKcQFk0FyxiV4BmRag9bFUZXmei4UjDyB2ovhMiOZQKxJak=
x-amz-request-id: EEDMETHC1F1R9JTE
Date: Mon, 22 Apr 2024 21:24:11 GMT
Last-Modified: Mon, 22 Apr 2024 21:23:56 GMT
ETag: "3c099fc7a460e777422ae4445291de38"
x-amz-server-side-encryption: AES256
x-amz-version-id: null
Accept-Ranges: bytes
Content-Type: application/json
Server: AmazonS3
Content-Length: 37

00000000  1f 8b 08 00 60 d5 26 66  00 03 ab 56 ca 48 cd c9  |....`.&f...V.H..|
00000010  c9 57 b2 52 4a af ca 2c  50 aa e5 02 00 f5 54 66  |.W.RJ..,P.....Tf|
00000020  66 11 00 00 00                                    |f....|
00000025

$ aws s3api put-object --bucket nextstrain-staging --key trs_test.json --body gzipped --content-type application/json --content-encoding gzip
{
    "ETag": "\"3c099fc7a460e777422ae4445291de38\"",
    "ServerSideEncryption": "AES256"
}

$ curl -si --compressed https://nextstrain-staging.s3.amazonaws.com/trs_test.json | (while read -r line; do echo "$line"; [[ "$line" == $'\r' ]] && exec hexdump -C; done)
HTTP/1.1 200 OK
x-amz-id-2: AMoa8yHwNoWimnCMV+uckIfl94btqoWZGV21vlcpcT2wY8WVfnZV3qINkvqu+wLdk8lz0/VN1sU=
x-amz-request-id: ZERP234SJ600759G
Date: Mon, 22 Apr 2024 21:24:24 GMT
Last-Modified: Mon, 22 Apr 2024 21:24:21 GMT
ETag: "3c099fc7a460e777422ae4445291de38"
x-amz-server-side-encryption: AES256
Content-Encoding: gzip
x-amz-version-id: null
Accept-Ranges: bytes
Content-Type: application/json
Server: AmazonS3
Content-Length: 37

00000000  7b 22 68 65 6c 6c 6f 22  3a 22 67 7a 69 70 22 7d  |{"hello":"gzip"}|
00000010  0a                                                |.|
00000011

Start our server with a fresh cache and then run your test script (pointed at staging/trs/test) after each S3 upload in the example above to see behaviour you report reproduced.

tsibley commented 5 months ago

ETags are supposed to be aware of content-encoding, but S3's apparently aren't.

jameshadfield commented 5 months ago

... this sequence of events

Yeah, that sounds about right. And in reality that's a perfectly plausible series of events that we should expect to happen every now and again when uploading datasets.

To summarise: the upstream content-encoding has changed and we are not updating our cache. Now we may say the problem is upstream of our server, and it may well be, but at the end of the day this is a bug which manifests in our server. And maybe it's not worth solving - I'm guessing it's not - but something we should be aware of.

The compression middleware, and more specifically its different behaviour across our two APIs, was the piece of the puzzle that made the output very hard to understand. Thanks for looking into it for me, the pointers were helpful for me to figure out what's happening.

tsibley commented 5 months ago

The compression middleware, and more specifically its different behaviour across our two APIs, was the piece of the puzzle that made the output very hard to understand.

Yeah. This automatic compression wasn't being applied to the /charon/getDataset endpoint because that endpoint wasn't setting a Content-Type. When I noticed that in the debug logs from compression, I opened #835.