Closed missinglink closed 2 years ago
The code is definitely clearer, but I'm seeing a significant reduction in the benchmarks
benchmark old ns/op new ns/op delta
BenchmarkScan_lineString-12 507 1339 +164.21%
BenchmarkDecode_lineString-12 3229 3368 +4.30%
BenchmarkScan_multiLineString-12 4946 12754 +157.86%
BenchmarkDecode_multiLineString-12 31737 33385 +5.19%
BenchmarkEncode_LineString-12 91.3 90.4 -1.06%
The code is definitely clearer, but I'm seeing a significant reduction in the benchmarks
benchmark old ns/op new ns/op delta BenchmarkScan_lineString-12 507 1339 +164.21% BenchmarkDecode_lineString-12 3229 3368 +4.30% BenchmarkScan_multiLineString-12 4946 12754 +157.86% BenchmarkDecode_multiLineString-12 31737 33385 +5.19% BenchmarkEncode_LineString-12 91.3 90.4 -1.06%
Huh, that's so weird 😂 I guess we leave it as-is, I really don't understand why 🤷♂️
So... just looking at the diff... you'd think that passing a ByteOrder (which appears in the Go src to be a method-only struct) would be equivalent to passing an int
around.
In that regard I don't see any benefit to passing it as a pointer, although it wouldn't hurt to try.
Could it be that the escape analysis is moving it to the heap memory and the old version stayed all on the stack?
My theory is that since ByteOrder is an interface, anytime you call order.Uint32
it has to dereference the interface and that has overhead.
I've also run into "slow copy" issues when passing "big" things as function arguments, like structs not as pointers. So maybe the ByteOrder interface being 2x64 bits has a different copy path than just a 64 bit word. But I think it's most likely the first reason.
Hmm so from what I can tell from just Googling around and not writing any actual code 😆, interface dereferencing does incur a penalty but it's a paltry penalty, about 2ns.
There's a bunch of discussion online about how Go can't do proper escape analysis on interfaces because it's not able to guarantee that all implementers are not holding onto references, so it sounds like the result is that variables coming from an interface always end up on the heap, which would explain such dramatic differences.
The relevant bit from that linked issue:
Escape analysis can't prove that the slice won't escape when passed to an arbitrary interface implementation, even though it can and does know that the slice won't escape if passed to a littleEndian or bigEndian:
I'm probably going to close this PR, will think on it a bit more first.
Yeah, so apparently the current method is the fastest, it looks like the cost of calling a function is fairly high compared to the overall cost of such a simple operation (0.28ns
for the decode and 1.43ns
for the same thing though a non-inline function call, putting the cost of the call at ~1.2ns
). Obviously in real-world usage a lot of these calls will get inlined.
Some interesting titbits:
decodeWithFunction(binary.LittleEndian.Uint64, empty)
) is actually slightly faster by a tiny marginbool
instead of an int
is actually slower 🤷♂️ interface
using an if statement is considerably faster than just calling the interfacego test -bench=. -test.run=none -benchmem -benchtime=1000000000x benchmark_test.go 14s
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
Benchmark_LittleEndian_Direct_Uint64-8 1000000000 0.2865 ns/op 0 B/op 0 allocs/op
Benchmark_Inlined_Function_LittleEndian_Uint64-8 1000000000 0.2828 ns/op 0 B/op 0 allocs/op
Benchmark_Function_LittleEndian_Uint64-8 1000000000 1.434 ns/op 0 B/op 0 allocs/op
Benchmark_ByteOrder_Uint64-8 1000000000 3.009 ns/op 0 B/op 0 allocs/op
Benchmark_ByteOrder_Function_Uint64-8 1000000000 2.812 ns/op 0 B/op 0 allocs/op
Benchmark_ByteOrder_ManualDereference_Uint64-8 1000000000 1.773 ns/op 0 B/op 0 allocs/op
Benchmark_ByteOrder_ManualDereference_Integer_Uint64-8 1000000000 1.441 ns/op 0 B/op 0 allocs/op
Benchmark_ByteOrder_ManualDereference_Boolean_Uint64-8 1000000000 1.714 ns/op 0 B/op 0 allocs/op
package main
import (
"encoding/binary"
"testing"
)
var empty = make([]byte, 8)
// implementations
// note: this is the only implementation where inlining is allowed
func decodeInlined(bytes []byte) uint64 {
return binary.LittleEndian.Uint64(bytes)
}
//go:noinline
func decode(bytes []byte) uint64 {
return binary.LittleEndian.Uint64(bytes)
}
//go:noinline
func decodeWithInterface(order binary.ByteOrder, bytes []byte) uint64 {
return order.Uint64(bytes)
}
//go:noinline
func decodeWithFunction(fn func([]byte) uint64, bytes []byte) uint64 {
return fn(bytes)
}
//go:noinline
func decodeWithInterfaceManualDereference(order binary.ByteOrder, bytes []byte) uint64 {
if order == binary.LittleEndian {
return binary.LittleEndian.Uint64(bytes)
}
return binary.BigEndian.Uint64(bytes)
}
//go:noinline
func decodeWithInterfaceManualDereferenceInteger(order int, bytes []byte) uint64 {
if order == 0 {
return binary.LittleEndian.Uint64(bytes)
}
return binary.BigEndian.Uint64(bytes)
}
//go:noinline
func decodeWithInterfaceManualDereferenceBoolean(order bool, bytes []byte) uint64 {
if order {
return binary.LittleEndian.Uint64(bytes)
}
return binary.BigEndian.Uint64(bytes)
}
// benchmarks
func Benchmark_LittleEndian_Direct_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
binary.LittleEndian.Uint64(empty)
}
}
func Benchmark_Inlined_Function_LittleEndian_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeInlined(empty)
}
}
func Benchmark_Function_LittleEndian_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decode(empty)
}
}
func Benchmark_ByteOrder_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeWithInterface(binary.LittleEndian, empty)
}
}
func Benchmark_ByteOrder_Function_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeWithFunction(binary.LittleEndian.Uint64, empty)
}
}
func Benchmark_ByteOrder_ManualDereference_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeWithInterfaceManualDereference(binary.LittleEndian, empty)
}
}
func Benchmark_ByteOrder_ManualDereference_Integer_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeWithInterfaceManualDereferenceInteger(0, empty)
}
}
func Benchmark_ByteOrder_ManualDereference_Boolean_Uint64(b *testing.B) {
for n := 0; n < b.N; n++ {
decodeWithInterfaceManualDereferenceBoolean(false, empty)
}
}
this PR refactors
encoding/wkb/*
to remove the custom typetype byteOrder int
and replace it withbinary.ByteOrder
. it also removes the constantsconst bigEndian
andconst littleEndian
.this actually simplifies the code a fair bit since there is no need to write
if
conditions based on the desired endianness.let me know if this is something you'd consider merging and if there's anything else which needs to be done.
I noticed a fair amount of code duplication between
byteOrderType()
&readByteOrderType()
, I also considered refactoring those functions but didn't go that far in this PR.