Inefficient compression for short/unaligned writes

bmatsuo commented 10 years ago

Calls to Write should be buffered to avoid emitting blocks that are shorter than necessary (and by extension inefficiently compressed). I put together a worst-case albeit completely unrealistic example demonstrating the inefficiency.

https://github.com/bmatsuo/go-snappystream/blob/inefficient-short-writes/writer_test.go

--- FAIL: TestWriter_shortWriteEfficiency (0.00 seconds)
    writer_test.go:29: batch write size 27 not equal short write size 1110 (x41.1)
FAIL
exit status 1
FAIL    github.com/mreiferson/go-snappystream   0.010s

While the test case will not be seen in practice short writes will occur when writes are not made in a multiple of MaxBlockSize bytes.

IMO, simply wrapping Writer in a bufio.Writer is not adequate in the general case. I don't see a particular benefit to that over internal buffering. And chunks which cannot be compressed, or application specific logic that flushes the buffer prematurely will cause short writes. (if that is the intended solution it should be documented)

I think an API similar to "compress/gzip" should be exported. That is, there should be a type that implements

interface {
    io.WriteCloser
    Flush() error
}

mreiferson commented 10 years ago

@bmatsuo I wrote this for use in NSQ, where there are a variety of other mechanisms that buffer data prior to being compressed/sent to clients (and I wanted to keep this API really simple).

If you're interested in proposing an implementation for this, it should be straightforward to add a BufferedWriter type (naming aside) that would accomplish this.

bmatsuo commented 10 years ago

I'll see what I can throw together.

bmatsuo commented 10 years ago

I did try simply wrapping the writer as bufio.NewWriter(snappystream.NewWriter(w), snappystream.MaxBlockSize). It was slower than I think it can/should be.

Although it should be noted, the benefits from buffered writing are in compressed data size and, to some (unknown) extent, decoder performance. Encoder performance should not be expected to improve in the general case. My only point here is that the write performance hit from naive buffering is too great.

BenchmarkWriterManpage            100000  28196      ns/op  149.91   MB/s
BenchmarkBufferedWriterManpage    50000   58930      ns/op  71.73    MB/s
BenchmarkWriterJSON               5000    497640     ns/op  316.21   MB/s
BenchmarkBufferedWriterJSON       5000    552092     ns/op  285.02   MB/s
BenchmarkWriterRandom             5       253061220  ns/op  41.44    MB/s
BenchmarkBufferedWriterRandom     5       252087386  ns/op  41.60    MB/s
BenchmarkWriterConstant           100     18006040   ns/op  582.35   MB/s
BenchmarkBufferedWriterConstant   100     17643041   ns/op  594.33   MB/s
BenchmarkReaderManpage            100000  24050      ns/op  106.03   MB/s
BenchmarkReaderJSON               5000    417130     ns/op  54.13    MB/s
BenchmarkReaderRandom             500     3062984    ns/op  3423.80  MB/s
BenchmarkReaderConstant           100     19215639   ns/op  25.69    MB/s

These numbers are on a 2012 macbook air with a 2GHz dual core i7.

mreiferson commented 10 years ago

I'm not sure I understand how our implementation would considerably improve performance? The important variables here are the compression ratios.

I say let's move forward with exposing this (simple) type as a helper and optimize afterward.

bmatsuo commented 10 years ago

The low-hanging optimization I can think of is to skip buffering when asked to write a []byte larger than MaxBlockSize. There are also optimizations like "if the buffer is 80% it is full-enough to emit". The snzip utility has a few (afaict poorly documented) tuning parameters.

But, with this rabbit hole I am descending, I see your point is valid. Just get the correct API in using this method and worry about optimizing later. I'll just make a note in godoc that the buffered writer is a size optimization that should only be used if the use case involves many small writes.

mreiferson commented 10 years ago

yea, baby steps, let's just get the API in first.

bmatsuo commented 10 years ago

[bufio.Writer] was slower than I think it can/should be.

The apparent slowness was an artifact of the tests which are currently written in a way to eliminate the need for buffering (by passing one giant byte slice to Write). Something I changed in #6.

edit: just mentioning here. discussion on that change should probably be in the pull request.

mreiferson / go-snappystream

Inefficient compression for short/unaligned writes #3