nytimes / gziphandler

Go middleware to gzip HTTP responses
https://godoc.org/github.com/NYTimes/gziphandler
Apache License 2.0
868 stars 129 forks source link

Invalid Content-Type for small initial write. #45

Open tmthrgd opened 7 years ago

tmthrgd commented 7 years ago

The current method of inferring the mime type of the uncompressed data is broken when multiple calls are made to Write and the first block of data is small. The current method only considers the first call to Write and not subsequent calls. As the data is being buffered for the minSize test, it makes sense to detect the mime type across the buffer rather than the first fragment.

I noticed this in my fork which has diverged significantly, so my fix and test case won't apply cleanly, but they should provide someone a solid basis for fixing it in this repository, if someone thinks it worthwhile.

Test case was added in: tmthrgd/gziphandler@9855883c662e8516733524ea239882ee4ea9b1d7 Fix was added in: tmthrgd/gziphandler@4324668c5125c02964542a0b6d076b6bb28799a1

http.DetectContentType considers at most 512 bytes which is also the default minSize. So detecting the mime type over the minSize buffer provides much nicer behaviour here.

A test case that applies to this repository is below:

func TestInferContentType(t *testing.T) {
    wrapper, _ := NewGzipLevelAndMinSize(gzip.DefaultCompression, len("<!doctype html"))
    handler := wrapper(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        io.WriteString(w, "<!doc")
        io.WriteString(w, "type html>")
    }))

    req1, _ := http.NewRequest("GET", "/whatever", nil)
    req1.Header.Add("Accept-Encoding", "gzip")
    resp1 := httptest.NewRecorder()
    handler.ServeHTTP(resp1, req1)
    res1 := resp1.Result()

    const expect = "text/html; charset=utf-8"
    if ct := res1.Header.Get("Content-Type"); ct != expect {
        t.Error("Infering Content-Type failed for buffered response")
        t.Logf("Expected: %s", expect)
        t.Logf("Got:      %s", ct)
    }
}
adammck commented 7 years ago

Great bug report, thanks! It seems reasonable to delay content type sniffing until we either write the gzip encoding header or close without/flush gzipping; I'm not sure why it's in Write at all.

klauspost commented 3 years ago

Fixed in https://github.com/klauspost/compress/commit/b7e9e8e0e63041258fa6cb9f3a9e8ab6a5428715