Closed paulmach closed 7 years ago
Updated. The 32megs is an upper bound so I don't have to grow the buffer ever. Could try using bytes.Buffer
to handle a lot of that. I think the 32megs is fine since it's only once.
I should add, to your point, in the file I tested on it was in the 2megs range.
Every change breaks someone's workflow. 😄 People may run hundreds of decoders in parallel and may not expect increased memory consumption. So yeah, let's use bytes.Buffer.Grow()
.
Great improvement!
Updated so header and blob buffers use bytes.Buffer. Unless I'm doing something totally wrong i'm not seeing any speed improvement anymore.
BenchmarkDecode-4 1721956082 2514611990 +46.03%
BenchmarkDecodeConcurrent-4 1795521119 2687821585 +49.70%
This is what I was seeing in general. This one change was a huge win, other similar change had minimal if any impact. I don't know why.
@paulmach thanks for all the work. I haven't looked in depth but are there concurrency issues with this? I think it is possible for the buffers to be called from multiple goroutines and I am not sure it is safe to do that.
Have you thought about trying sync.Pool
?
Yes, this is safe, tests also pass with go test -race
.
decoder.Start()
launches a goroutine that sequentially reads the fileblocks/blobs and passes them to the concurrent "decoders". The buffer used to hold the file byte data before it gets protobuf decoded into a *OSMPBF.Blob
is what is being reused here. Any decoded data is copied into the proto struct by the proto.Unmarshal method (I tested this and looked at code). ie. When protobuf decodes the bytes it copies most of it into protobuf struct's blob.ZlibData []byte
attribute. This copy is then used by the decoders, no any part of the reused buffer.
It would be interesting to use a sync.Pool
for this blob.ZlibData
and then for the decoded zlib data (before if gets protobuf decoded again), but when I was trying other/similar improvements they didn't make much, if any, improvement.
Again, I don't know why preallocating a full 32 meg buffer results in such a dramatic speed improvement, but it is significant.
Note that the speed improvement is in the first commit in this PR. The second commit does the cleaner implementation that does not result in much of a change in performance.
It all looks very interesting, I will take another look this week.
I rolled back the byte.Buffer stuff that didn't make a difference on performance.
The pull being requested is the preallocation of the buffers that saves 33% on benchmarks.
I don't quite like the idea of preallocating such a large 32 Mb buffer space when most uses will probably not require anywhere near that amount. So I tried:
func (d *Decoder) buffer(size int) []byte {
if len(d.buf) < size {
d.buf = make([]byte, size)
}
return d.buf[:size]
}
I started with an initial buffer size of 0 bytes. The buffer size was increased 10 times to a maximum size of 500k with the test data. However I only saw a 2 - 3% decrease in execution time. If I initialized the buffer big enough so that it was never reallocated (i.e. 500k) I also only saw maybe 10 - 14% decrease in execution time. However with the changes made in this pull request, I saw 40% decrease in execution time as described by @paulmach which is pretty spectacular! I saw similar speed having one member variable buf
instead of sizeBuf
, headerBuf
and blobBuf
.
@AlekSi what are your thoughts on preallocating a maxBlobSize
buffer for the decoder to use? I don't see why having a gradually increasing buffer size negates most performance improvements.
So I took a look. And then again. And again. In the end, I exposed a method to set buffer size. See #19.
I don't see why having a gradually increasing buffer size negates most performance improvements.
Me too. And that bothers me.
On my amd64 macbook pro with go1.7
go1.6
benchmarks run with
go test -bench . --benchtime 10s
so the numbers are 3-5 run averages.I don't know why this is such a large improvement. I tried a bunch of more/similar things and nothing changed the performance like this.