nytimes / gziphandler

Go middleware to gzip HTTP responses
https://godoc.org/github.com/NYTimes/gziphandler
Apache License 2.0
868 stars 129 forks source link

Change DefaultMinSize to match the nginx default #93

Closed CAFxX closed 3 years ago

CAFxX commented 4 years ago

The current DefaultMinSize does not make much sense, since the arguments used for choosing the current value are fundamentally flawed in their assumptions. See the comments in the patch for arguments.

For lack of better data to make an informed decision, use the same default as nginx.

akyoto commented 4 years ago

Is the compression of < 1.4 KB payloads beneficial to the overall performance? Including server load and end-user experience.

The last time I tested this (which was about 5 years ago) the CPU overhead of compressing was not worth the cost. Even if it didn't fit into the MTU, sending it plainly uncompressed was better in my tests. However it's been a while and I haven't seen any recent benchmarks.

If anyone has real-world data that he/she would like to share, send me a link.

CAFxX commented 4 years ago

The last time I tested this (which was about 5 years ago) the CPU overhead of compressing was not worth the cost

Can you share the details of the test setup? From your reply it sounds you just considered server-side CPU usage, but that wouldn't make sense (if you only cared about that then you should disable compression altogether...).

If anyone has real-world data that he/she would like to share, send me a link.

In the comment in the PR there are references to the defaults used by other well known sources. None suggest to use such high minimum (let alone the MTU argument, that has been addressed in the rest of the comments).

akyoto commented 4 years ago

Yep, the test setup was using ab/wrk and a server under high load (so server load was definitely a factor) and disabling compression for these very small responses that only had a minimal effect on bandwidth costs was very beneficial to overall latency because it freed up resources for other users requesting new pages.

It's not only caring about server load (then I would disable compression entirely just like you said), rather looking at it from a cost / benefit perspective and evaluating if a certain algorithm is worth the cost for the savings we get. What is the benefit? Reduction of <1 KB on the wire. What is the cost? Higher CPU usage and TTFB latency. Do the benefits outweigh the costs? That is something everybody has to decide for themselves.

Then there's also the problem that mobile clients with awful CPUs need to power up the decompression algorithm on the client side versus just receiving a small reply directly. On the other hand, mobile devices are more likely to have tiny MTUs.

Sure, no benchmark is ever perfect. And no threshold is ever perfect for every situation.

It boils down to 3 things:

Depending on the situation, "always gzip everything over 20 bytes" might be the ideal solution.

My personal experiences differ and I think the suggested threshold is a little too low, but I can agree that a sensible default could be a little lower than the current value. Especially for bad network situations.

Disclaimer: Not an author of the library, just adding my thoughts as a server developer.

CAFxX commented 4 years ago

@akyoto from my perspective, server costs and bandwidth are absolutely marginal costs. We are pretty autoscaling-happy, and that helps offset (reasonable) increases in per-request processing to the cost side instead of to the latency side. Furthermore, we already use a very low minimum response size, and gzip compression barely shows up in the server CPU profiles.

At the same time, most of our clients (but this is an industry-wide trend by now) are on mobile connections, where bandwidth is a premium and the network latency completely masks the small increase in request processing time (by multiple orders of magnitude).

All of this is obviously not 100% applicable to all cases but, and this is the crucial point, none of this has any bearing on this PR.

This PR does not change the default and then offers a justification for the change: it is the opposite. The primary goal of the PR is to remove the faulty argument for choosing this default based on its relationship with internet MTUs. The point is that this is a baseless assumption because that's not how networks work, especially when talking about the internet.

As a consequence of the point above, and only as a consequence, I also proposed a new default, taken by surveying the current state of the recommendations and defaults by well-known sources, to replace the previous one that the comments show had been chosen for the wrong reasons. I personally don't agree with the 20 bytes threshold either (I think something in the range 50-100 would be more reasonable) but this also is beyond the point: I did not want to conjure an arbitrary number out of thin air so I picked 20 just because it is the nginx default.

If there are good arguments for something different, I'm more than happy to change the default. I could even agree to not change the default at all (it wouldn't affect us, as we're not using the default), as long as the comment is changed - because that's factually incorrect and I'm seeing that argument being repeated around and misused even in other contexts.

tmthrgd commented 4 years ago

Although I did include the 1400 byte DefaultMinSize change in my fork, I've long thought the reasoning was very wrong and the default should be much lower so I'd very much like to see this change made.

All that being said, 20 bytes is definitely way too low. The GZIP format has a 10-byte header and 8-byte footer, meaning it imposes 18-bytes of overhead before you consider the overhead of deflate framing. (Not to mention the byte overhead of the Content-Encoding: gzip header). I'm not sure what a better value would be, but something 100-200 would probably make sense.

akyoto commented 4 years ago

@CAFxX Very convincing points :+1:

I've changed mine after playing with some sample data.

CAFxX commented 4 years ago

Any chance of getting this looked at?

mwat56 commented 4 years ago

Any chance of getting this looked at?

I'd prefer a version that allows the lib-user to specify their own value before e.g. calling GzipHandler() (which in turn calls NewGzipLevelHandler() and NewGzipLevelAndMinSize()).

While I understand the arguments in the discussion above there remain both different points of view and use-cases. So I think the constant DefaultMinSize should be removed altogether and replaced by a configurable variable (with whatever initial value). Hence the current PR should be replaced by one implementing a configurable variable as outlined above.

CAFxX commented 4 years ago

@mwat56 you can already override the minimum size when calling NewGzipLevelAndMinSize or GzipHandlerWithOpts passing the MinSize option; DefaultMinSize is just the default in case no explicit value is provided by the lib user.

If you're arguing the default should be 0 if no minimum size is explicitly specified by the lib user... I think I need to disagree, as that's clearly counter-productive (since gzip tends to make very small messages bigger instead of smaller)

mwat56 commented 4 years ago

@CAFxX:

you can already override the minimum size when calling NewGzipLevelAndMinSize or GzipHandlerWithOpts passing the MinSize option; DefaultMinSize is just the default in case no explicit value is provided by the lib user.

Indeed. But I was talking about GzipHandler() which is probably the most used funktion and which I'd have to re-implement if I'd like to use another value for the minSize (probably using GzipHandlerWithOpts() which doesn't have a documented API).

If you're arguing the default should be 0 if no minimum size is explicitly specified by the lib user... I think I need to disagree, as that's clearly counter-productive (since gzip tends to make very small messages bigger instead of smaller)

No, I was arguing for the user's freedom to choose any value they deem fit instead of (a) live with a constant value provided by the library or (b) have to re-implement the functionality of GzipHandler() by using an undocumented function.

Cheers!

CAFxX commented 4 years ago

@CAFxX:

you can already override the minimum size when calling NewGzipLevelAndMinSize or GzipHandlerWithOpts passing the MinSize option; DefaultMinSize is just the default in case no explicit value is provided by the lib user.

Indeed. But I was talking about GzipHandler() which is probably the most used funktion and which I'd have to re-implement if I'd like to use another value for the minSize (probably using GzipHandlerWithOpts() which doesn't have a documented API).

Then the solution is clearly documenting GzipHandlerWithOpts, as that's exactly the function you described in the previous post, and it's already a public function of the package. Your concern is trivially satisfied by replacing, in your code:

h = gziphandler.GzipHandler(h)

with:

gzipAdapter := gziphandler.GzipHandlerWithOpts(gziphandler.MinSize(n))
h = gzipAdapter(h)

Feel free to open a different PR though to add the missing documentation for GzipHandlerWithOpts, as it's completely outside the scope and goal of this one. Thanks!

mwat56 commented 4 years ago

@CAFxX:

The problem with this PR is that you want to forcefully change the behaviour of the library for all its users. That should not be taken lightly.

Then the solution is clearly documenting GzipHandlerWithOpts, as that's exactly the function you described in the previous post, and it's already a public function of the package. Your concern is trivially satisfied by replacing, in your code:

h = gziphandler.GzipHandler(h)

with:

gzipAdapter := gziphandler.GzipHandlerWithOpts(gziphandler.MinSize(n))
h = gzipAdapter(h)

Can't you see it? Really? You want to force all users of the library to change their source-code who are not happy with the (unreasonable small) value this PR is suggesting? (Assuming they actually find out why their programs suddenly behave differently and are able to find the responsible PR and are able to revert said PR's effect).

Generally speaking, whenever you change a library your goal should be to cause the least possible disturbance/surprise for the users of said library. Which in this case would be to leave the current value as is – and make it configurable as suggested above (with the current value left as default). . And, if you're not happy with that: Why are not you just do the changes you suggested above to your own sources instead of forcing everybody else? Or stick with your personal fork of this library?

Feel free to open a different PR though to add the missing documentation for GzipHandlerWithOpts, as it's completely outside the scope and goal of this one. Thanks!

Judging from the fact that everything (public) else in gzip.go is already documented I assume that the library's authors had a reason for omitting the documentation. But in any case, you're right, it's outside the scope of this PR.

Cheers!

CAFxX commented 4 years ago

@mwat56 I think all the points you raised have already been addressed in the PR itself or in the previous discussions, so I'm not going to repeat.

But you raise a valid point about changing a default (even though you have undermined it by arguing that if something is not documented then it should not be relied upon: it is not documented anywhere that the default is guaranteed to be 1400, now or in future versions of the library)

Undermining aside, I'm totally open to not changing the default (also, as we already don't rely on the default value since it makes no sense, not changing the default would not affect us directly - it would only affect other users that will keep using the old default and missing the benefits of a smaller one) and just change the documentation (that was the primary goal of this PR all along, as already explained above).

The reason why I pinged was to get feedback from the maintainers in this sense.