Closed mdisibio closed 2 years ago
Hello @mdisibio
I believe this issue might be isolated to Go 1.17 and the use of unsafe.Slice
in unsafecast.BytesToUint32
which loses the capacity information.
A different fix could be this:
diff --git a/column.go b/column.go
index d511623..520ee77 100644
--- a/column.go
+++ b/column.go
@@ -636,7 +636,7 @@ func (c *Column) decodeDataPage(header DataPageHeader, numValues int, repetition
if pageKind == ByteArray {
obuf = buffers.get(4 * (numValues + 1))
defer obuf.unref()
- pageOffsets = unsafecast.BytesToUint32(obuf.data)
+ pageOffsets = unsafecast.BytesToUint32(obuf.data[:cap(obuf.data)])
}
values := pageType.NewValues(pageValues, pageOffsets)
@@ -647,7 +647,8 @@ func (c *Column) decodeDataPage(header DataPageHeader, numValues int, repetition
vbuf.data = pageValues
}
if obuf != nil {
- obuf.data = unsafecast.Uint32ToBytes(pageOffsets)
+ obuf.data = unsafecast.Uint32ToBytes(pageOffsets[:cap(pageOffsets)])
+ obuf.data = obuf.data[:4*len(pageOffsets)]
}
if err != nil {
return nil, err
I would recommend upgrading to 1.18+ tho, the unsafecast
package is written with generics then and does not suffer from the same issues.
And when Go 1.20 gets released we will need to have more serious discussions about dropping support for 1.17.
Hi @achille-roussel afaik Tempo has been updated to 1.19, and I am running 1.19.1 locally. Tested the proposed lines above and didn't see an improvement. Happy to provide more info or try things out.
Thanks for confirming, sorry for making the wrong assumption here.
I'll dig further to understand why the capacity information gets lost.
Super happy to hear that you were able to do the upgrade to 1.19 as well!
I would be curious to know if using the Release/Retain functions introduced in #380 will modify the behavior you observed here!
Hi @achille-roussel as discussed offline, the new Release/Retain functionality did not change the observed behavior. Spent some more effort in this area and was able to create a more formal code change. I think this also shows more clearly what I think is happening (page buffer being swapped and original lost), but I don't know the why.
if obuf != nil {
if unsafecast.PointerOf(obuf.data) != unsafecast.PointerOf(pageOffsets) {
// Pageoffset buffer is being swapped.
// Recreate obuf which allows the orignal to be unrefed and returned to the pool.
obuf = &buffer{pool: obuf.pool, data: unsafecast.Uint32ToBytes(pageOffsets)}
}
}
This change yields significant memory alloc improvements on top of the benchmarks in Tempo #1818
name old alloc/op new alloc/op delta
BackendBlockSearchTraces/noMatch-12 333MB ±11% 50MB ±17% -84.97% (p=0.000 n=10+10)
BackendBlockSearchTraces/partialMatch-12 605MB ± 2% 311MB ± 5% -48.49% (p=0.000 n=9+9)
BackendBlockSearchTraces/service.name-12 3.67MB ± 2% 1.51MB ± 3% -58.73% (p=0.000 n=10+10)
As discussed offline, the real root cause is that page offset buffers were being unnecessarily allocated for dictionary-encoded byteArray columns. Submitted #398.
Want to mention that #386 still yielded a minor improvement in cpu and may be worth merging (although was neutral on memory usage). Here is a benchmark comparison:
name old time/op new time/op delta
BackendBlockSearchTraces/noMatch-12 204ms ± 6% 198ms ±31% ~ (p=0.400 n=9+10)
BackendBlockSearchTraces/partialMatch-12 2.57s ± 2% 2.51s ± 2% -2.38% (p=0.002 n=9+8)
BackendBlockSearchTraces/service.name-12 1.57ms ±12% 1.46ms ±12% -6.94% (p=0.028 n=9+10)
name old alloc/op new alloc/op delta
BackendBlockSearchTraces/noMatch-12 342MB ± 4% 332MB ± 4% -2.96% (p=0.011 n=9+8)
BackendBlockSearchTraces/partialMatch-12 601MB ± 1% 607MB ± 2% +1.11% (p=0.031 n=9+9)
BackendBlockSearchTraces/service.name-12 3.67MB ± 1% 3.69MB ± 2% ~ (p=0.460 n=8+10)
name old allocs/op new allocs/op delta
BackendBlockSearchTraces/noMatch-12 119k ± 1% 117k ± 4% -1.73% (p=0.021 n=8+8)
BackendBlockSearchTraces/partialMatch-12 18.0M ± 0% 18.0M ± 0% ~ (p=0.438 n=8+9)
BackendBlockSearchTraces/service.name-12 39.2k ± 0% 39.2k ± 0% ~ (p=0.403 n=10+10)
Related to memory profiling done in #370 , it looks like page offset buffers are attempted to be pooled, but it is not working as expected. Suspect this logic which swaps the slices.
Testing methodology Added some simple instrumentation to
bufferPool
to track outstanding allocs via sum(gets) - sum(puts). Added a print statement here to print the before and after capacities when the offset buffer is swapped.Results The pageOffsetsBufferPool allocs much more than expected, several GB. The other pools have reasonable allocs (1K to ~30MB).
When
obuf.data
is swapped to a slice of different size, the new slice is 0-length. Unsure if this is specific to our schema or relevant, but wanted to mention.prints:
Solution Tried this simple fix, but it doesn't quite work. Our tests surface some data inconsistencies afterwards.
Hacking a little more does work, but I'm not familiar enough with this area to put a proper solution. This change does result in 50+% fewer allocs.