pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.12k stars 688 forks source link

[Feature Request] Integrating compression in ResponseWriter (Brotli, gzip, and deflate) #1148

Closed kiplingw closed 8 months ago

kiplingw commented 11 months ago

Hello everyone,

I'd like to put a feature request in as a starting point for discussion before I spend time implementing it.

Currently many clients, including Python's request library, by default send an Accept-Encoding header requesting at least gzip and deflate (the two most common codecs). The idea of clients expecting the server to be capable of sending a compressed response is so common it would not make sense for us to make every Pistache application go and re-implement the necessary changes when this could be implemented directly in the library.

What I was thinking was the following:

Thoughts?

dennisjenkins75 commented 11 months ago

This is reasonable, so long as it can be disabled at runtime as well.

kiplingw commented 11 months ago

To disable at runtime user would do nothing. If they hadn't configured ResponseWriter to do anything different, then its default behaviour would be to simply not compress what it's provided.

Tachi107 commented 11 months ago

This sounds like a more then reasonable request, and I support its implementation. Still, I have heard multiple times that TLS + compression can create significant security issues, even though I am not sure how or why. We should definitely investigate - or, in case you already know why this can happen, please write it here so that I can better understand the issue :)

@Ne02ptzero, since you've implemented TLS support for pistache in the past, do you know what issues compression with encryption can create?

kiplingw commented 11 months ago

I did a bit of research this morning and it looks like the security issue arose when using integrated compression with TLS. But I don't think this is a problem for us since we're proposing to handle it at the Pistache library layer (although I'll do more research).

It also looks like we already disable compression in OpenSSL, even if the user requested it to mitigate BREACH and CRIME vulnerabilities.

Tachi107 commented 11 months ago

Alright, so this shouldn't create issues for us since we'd be using compression on top of TLS, right?

kiplingw commented 11 months ago

We'd be compressing the data before it was fed into OpenSSL, so yes.

kiplingw commented 11 months ago

My PR is only a partial fix. But I realized belatedly that it's also partially broken.

If the client requests multiple different compression schemes in Accept-Encoding, like gzip, br, deflate, and perhaps even with q-factors, the patch as is expects but one compression scheme (at most) and no q-factor.

I was looking at what we could recycle that parses those kinds of headers. The closest I could find is the MediaType class but it's specific to media types even though it can almost parse the Accept-Encoding format correctly.

I'm a bit surprised that the only header that implements multiple values with q-factors is that header, despite there being many that a client could submit.

Tachi107 commented 11 months ago

My PR is only a partial fix. But I realized belatedly that it's also partially broken.

If the client requests multiple different compression schemes in Accept-Encoding, like gzip, br, deflate, and perhaps even with q-factors, the patch as is expects but one compression scheme (at most) and no q-factor.

Thanks for noticing. This isn't an issue in practice yet since we only support one encoding (deflate/zlib), but we should definitely fix this.

I was looking at what we could recycle that parses those kinds of headers. The closest I could find is the MediaType class but it's specific to media types even though it can almost parse the Accept-Encoding format correctly.

I've submitted a patch before about qvalues, so I've already had a look at that part of the codebase. I'll look into this while adding Brotli support.

I'm a bit surprised that the only header that implements multiple values with q-factors is that header, despite there being many that a client could submit.

I'm not sure I get what you mean, but qvalues are used with the Accept header, among others.

kiplingw commented 11 months ago

Thanks for noticing. This isn't an issue in practice yet since we only support one encoding (deflate/zlib), but we should definitely fix this.

Well true that we only support deflate, but the point is the client could have listed any number of encodings we don't support - in addition to deflate. As it is right now Pistache won't be able to select the only one that it does.

I've submitted a patch before about qvalues, so I've already had a look at that part of the codebase. I'll look into this while adding Brotli support.

Good idea.

I'm a bit surprised that the only header that implements multiple values with q-factors is that header, despite there being many that a client could submit.

I'm not sure I get what you mean, but qvalues are used with the Accept header, among others.

Oh I agree. What I'm saying is that I'm surprised that in Pistache Accept is the only one that supports it despite, as you pointed out, many headers use that format.

Tachi107 commented 9 months ago

I've been looking a bit at this, and I think that your patch itself is "OK". The newly added code doesn't have issues with q-values and multiple compression schemes, because it simply doesn't check for them: it leaves the job to the user.

Since you've only added a setCompression() method it is the user who decides if and when to enable compression (like showed in the ContentEncodingHandler::onRequest() function used in the new test), so, with the current API, it is up to the user to parse q-values in the onRequest() function and decide which compression algorithm to enable.

What I think would make sense is to add a helper function similar to this:

Http::Header::Encoding getBestEncoding(const Http::Request& request) {
    /* get client's preferred encodings, sorted by preference, accounting for q-values */
    const std::array<Http::Header::Encoding, N> preferredEncodings = getPreferredEncodings(request);
    /* return the first encoding that we support */
    for (const auto encoding : preferredEncodings) {
        if (supported(encoding)) {
            return encoding;
        }
    }
}

User code would then call writer.setCompression(getBestEncoding(request)) in onRequest(), which would set the proper content encoding.

kiplingw commented 9 months ago

Yes, I think that's almost the way to implement it @Tachi107. But I would suggest renaming getBestEncoding to getBestAcceptEncoding and getPreferredEncodings to getAllAcceptEncodings so that it's more clear that we're talking about the client's requested encodings.

Do you want to give it a go?

I'm also thinking that this should be made more generic because the q-value mechanism is used in HTTP headers in more places than just the Accept-Encoding header.

kiplingw commented 9 months ago

And one other thing that occurred to me this morning, to ensure we don't break user code. If none of the user's requested encodings are available (e.g. such as all codecs disabled at configure time), then it should fall back safely to the identity encoding scheme as explained here.