w3c / beacon

Beacon
https://w3c.github.io/beacon/
Other
46 stars 22 forks source link

Option to request beacon be compressed #72

Open nicjansma opened 2 years ago

nicjansma commented 2 years ago

It would be interesting to consider an option for the beacon API where you could request the payload to be compressed before being sent.

With the sendBeacon() API today, if you want to do some sort of compression on the data before sending it, you would use the Compression Stream API, which is an async-only API.

Here's an example of doing this at e.g. page load time:

async function compressBlob(data) {
    const stream = new Response(data).body
        .pipeThrough(new CompressionStream('deflate'));
    return new Response(stream).arrayBuffer();
}

(async function() {
    var data = JSON.stringify(performance.getEntries());

    // this will send OK
    var dataGz = await compressBlob(data);
    navigator.sendBeacon('/beacon?load', dataGz);
}());

Unfortunately if you wanted to do this at pagehide/beforeunload/unload, you can't utilize the Compression Stream API since it is async. You would be waiting for the stream callback (await), but by then the page would be unloaded:

(async function() {
    window.addEventListener("pagehide", async function() {
        var data = JSON.stringify(performance.getEntries());

        // this will send OK
        navigator.sendBeacon('/beacon?before-await', data);

        // this will not send due to await
        var dataGz = await compressBlob(data);
        navigator.sendBeacon('/beacon?after-await', dataGz);
    });
}());

If we could ask the browser to compress the payload before sending, it could look something like this:

navigator.sendBeacon('/beacon?after-await', data, {
    compress: "deflate"
});

And we could easily get it compressed from unload-style events (assuming the browser handles that async compression and beaconing later).

yoavweiss commented 2 years ago

/cc @ricea

ksylor commented 2 years ago

+1 to this from Etsy’s perspective! Particularly with RUM data the payload can be big, and this would allow us to use compression in more cases.

Krinkle commented 2 years ago

+1 from Wikimedia as well.

A few things come to mind:

There is a perhaps not-so-obvious need for support on the server-side here. Which means if an intermediary library changes its sendBeacon() call to enable compressed transfer encoding, then any consumer of that that has a receiving server must now support that or it might break. Is that right? I couldn't find precedent for negotiating upload encoding in other HTTP upload mechanisms (e.g. large www-form-encoded post submissions, or large uploads via HTML5 file inputs with file formats that may benefit from compression). This seems hard to negotiate in a progressive and backward-compatible manner, e.g. like downloads where the request is expected to have Accept-Encoding, which lets the sender toggle compression as-needed. This may be fine, but it's worth considering as such and documenting.

There is also a perhaps not-so-obvious need for that same server to also retain ability to process uncompressed submissions as browsers presumably don't have to follow this option, at least until all browsers support it.

That leaves us with how to spec this. Do we spec it as something the browser must implement and must follow if passed? That seems simplest, but means it's less obvious that the same code interpreted by browsers following an older version of the spec ignore the option. I wonder if it would make sense to spec it has a hint and let the browser decide. That would make it more obvious that receiving servers can't assume all inputs to be compressed, and also leaves some room for "user/device knows best"-type optimisations based on whatever hueristics browser vendors and users may come up with in the future (e.g. optimise for high bandwidth, or low CPU, or skip below size threshold etc.)

On naming, if this will involve the Content-Encoding header in the underlying spec and implementation, that may be worth using in the naming as well for consistency and familiarity, e.g. contentEncoding: "gzip". On the other hand, it might actually cause unwarranted confidence if we go with the idea of it being a "hint" that the browser may ignore, and/or if we want to support mutiple choices at some point, e.g. ["deflate", "br"], where Brotli would be added by developer if and when their consumer/server supports that.

ksylor commented 2 years ago

Those are great points @Krinkle!

There is a perhaps not-so-obvious need for support on the server-side here. Which means if an intermediary library changes its sendBeacon() call to enable compressed transfer encoding, then any consumer of that that has a receiving server must now support that or it might break. Is that right?

I think that is right, but it seems reasonable to me that companies would need to make a conscious decision to enable the compression feature via the parameter, and then would also be responsible for updating their observability systems to accept & un-encode the payload based on Content-Encoding and Content-Type. If folks are using a third party library they would still need to have some knowledge of their system that accepts the data, or else the third party library is used to post to a third party service, but I could definitely be wrong about that assumption!

There is also a perhaps not-so-obvious need for that same server to also retain ability to process uncompressed submissions as browsers presumably don't have to follow this option, at least until all browsers support it.

This seems like another fair tradeoff to me. Not sure how widespread this is as a practice, but we already have to do a similar multi-type support in our data capture endpoints to accept both sendBeacon (content-type: text/plain;charset=UTF-8) and xhr (content-type: application/x-www-form-urlencoded) payloads as a fallback in non-supporting browsers. Not that we necessarily want to also manage a third option, but the tradeoff might be worth it to decrease payload sizes over the wire significantly.

The complexity could potentially be alleviated by exposing the supported compression types and then using different URLs per send type - something like this?

if (navigator.sendBeacon && navigator.sendBeacon.supportsEncoding("gzip")) {
    navigator.sendBeacon('/endpoint/accepts/gzip', data, {
        contentEncoding: "gzip"
    });
} else {
    navigator.sendBeacon('/endpoint/accepts/json', data);
}

However if your idea of browsers using it as a hint takes off, then that would make it much more complex of an API that would need to be exposed in order to do that kind of url switching in client code. I'm not sure though if that's a solid argument against treating it as a hint or not?

Interestingly when I ran a test of @nicjansma 's example code in Chrome 96, currently when you send the compressed results through navigator.sendBeacon both the Content-Encoding and Content-Type headers aren't part of the resulting request, so there must be a bug with the interoperability of streams and the beacon api as it is implemented now?

yoavweiss commented 2 years ago

There are 2 ways we could go about this: 1) Define a developer request to compress the content as an imperative, and necessarily compress the payload. In this case, if the application asked for the payload to be compressed, the backend would need to decompress the data as part of the processing pipeline, without requiring the underlying server to support that compression necessarily. (that is, no need for Content-Encoding, etc). @ksylor is right that in this case, the negotiation mechanism can be based on the upload URL. 2) Define a developer request to compress as a hint (which can contain a list with multiple preferred encodings), which the browser can act on at will. In this case, we would need a negotiation mechanism to tell the server which encoding was applied. [Content-Encoding](https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.2.1) is indeed the natural candidate.

(2) seems more robust and ergonomic, especially if we want the browser to potentially beacon the data once the renderer is no more.

/cc @mnot @reschke - as I vaguely remember conversations about the use of Content-Encoding as a request header field, but don't remember their details.

yoavweiss commented 1 year ago

Given that PendingBeacon is the next big thing, should we aim for this use case to be supported there instead?

/cc @clelland

yoavweiss commented 1 year ago

@nicjansma - I'd also love your thoughts on whether the PendingBeacon proposal obsoletes this feature request

ksylor commented 1 year ago

Given that PendingBeacon is the next big thing, should we aim for this use case to be supported there instead?

Yes definitely! I was just reading over the proposal for PendingBeacons and it seems like it would solve quite a number of issues, it would be great to get compression support built into that API vs. a later bolt-on.

nicjansma commented 1 year ago

I think that's fair to push this to PendingBeacon.

As far as I can tell, there's nothing significant you could do with sendBeacon() that you wouldn't be able to do (and more) with PendingBeacon?