google / brotli

Brotli compression format
MIT License
13.3k stars 1.22k forks source link

Reusing/resetting a brotli.Writer instance #1132

Open ankon opened 4 months ago

ankon commented 4 months ago

We saw good performance improvements in our go program when using a sync.Pool to cache and reset gzip writers rather than creating them anew each time.

I was wondering whether the same would be true for the brotli writer, too, but that one doesn't seem to have a trivial "reset this" method. On the underlying C level it looks like an encoder state can be "created"/"destroyed", but there are functions to "initialize" and "cleanup" as well.

Before I try to hack this together: Do you think that pooling/reusing of the encoder state could have a measurable/visible performance impact?

eustas commented 4 months ago

Initialize and cleanup are sort-of internal. Could you point me to the changes regarding gzip, so I could investigate the source of performance improvements. I could assume that performance is impacted by JNI roundtrips. If that is true, then we could think of something to reuse / refurbish instances.

ankon commented 4 months ago

So for gzip a lot of places1,2 suggest this approach (pseudo-golang here):

writerPool := &sync.Pool{
  New: func() any {
    writer, _ := gzip.NewWriterLevel(nil, g.Level)
    return writer
  },
}

// later when using it to wrap an existing output writer
var w io.Writer
// Grab a possibly pooled gzip writer
writer := writerPool.Get().(*gzip.Writer)
// Reset its structures and point it to now write to w
writer.Reset(w)

Doing some benchmarks (see [1] for an idea) also for us showed that the number of allocations went down quite a bit, and I think that might be due to internal caches and buffers getting reused.

See also gzip.Writer's Reset documentation