wkb: simplify byte order code

missinglink commented 2 years ago

this PR refactors encoding/wkb/* to remove the custom type type byteOrder int and replace it with binary.ByteOrder. it also removes the constants const bigEndian and const littleEndian.

this actually simplifies the code a fair bit since there is no need to write if conditions based on the desired endianness.

let me know if this is something you'd consider merging and if there's anything else which needs to be done.

I noticed a fair amount of code duplication between byteOrderType() & readByteOrderType(), I also considered refactoring those functions but didn't go that far in this PR.

paulmach commented 2 years ago

The code is definitely clearer, but I'm seeing a significant reduction in the benchmarks

benchmark                              old ns/op     new ns/op     delta
BenchmarkScan_lineString-12            507           1339          +164.21%
BenchmarkDecode_lineString-12          3229          3368          +4.30%
BenchmarkScan_multiLineString-12       4946          12754         +157.86%
BenchmarkDecode_multiLineString-12     31737         33385         +5.19%
BenchmarkEncode_LineString-12          91.3          90.4          -1.06%

missinglink commented 2 years ago

The code is definitely clearer, but I'm seeing a significant reduction in the benchmarks

benchmark                              old ns/op     new ns/op     delta
BenchmarkScan_lineString-12            507           1339          +164.21%
BenchmarkDecode_lineString-12          3229          3368          +4.30%
BenchmarkScan_multiLineString-12       4946          12754         +157.86%
BenchmarkDecode_multiLineString-12     31737         33385         +5.19%
BenchmarkEncode_LineString-12          91.3          90.4          -1.06%

Huh, that's so weird 😂 I guess we leave it as-is, I really don't understand why 🤷‍♂️

missinglink commented 2 years ago

So... just looking at the diff... you'd think that passing a ByteOrder (which appears in the Go src to be a method-only struct) would be equivalent to passing an int around.

In that regard I don't see any benefit to passing it as a pointer, although it wouldn't hurt to try.

Could it be that the escape analysis is moving it to the heap memory and the old version stayed all on the stack?

paulmach commented 2 years ago

My theory is that since ByteOrder is an interface, anytime you call order.Uint32 it has to dereference the interface and that has overhead.

I've also run into "slow copy" issues when passing "big" things as function arguments, like structs not as pointers. So maybe the ByteOrder interface being 2x64 bits has a different copy path than just a 64 bit word. But I think it's most likely the first reason.

missinglink commented 2 years ago

Hmm so from what I can tell from just Googling around and not writing any actual code 😆, interface dereferencing does incur a penalty but it's a paltry penalty, about 2ns.

There's a bunch of discussion online about how Go can't do proper escape analysis on interfaces because it's not able to guarantee that all implementers are not holding onto references, so it sounds like the result is that variables coming from an interface always end up on the heap, which would explain such dramatic differences.

Eg. https://github.com/golang/go/issues/27403

missinglink commented 2 years ago

The relevant bit from that linked issue:

Escape analysis can't prove that the slice won't escape when passed to an arbitrary interface implementation, even though it can and does know that the slice won't escape if passed to a littleEndian or bigEndian:

missinglink commented 2 years ago

I'm probably going to close this PR, will think on it a bit more first.

missinglink commented 2 years ago

Yeah, so apparently the current method is the fastest, it looks like the cost of calling a function is fairly high compared to the overall cost of such a simple operation (0.28ns for the decode and 1.43ns for the same thing though a non-inline function call, putting the cost of the call at ~1.2ns). Obviously in real-world usage a lot of these calls will get inlined.

Some interesting titbits:

~~passing the function (decodeWithFunction(binary.LittleEndian.Uint64, empty)) is actually slightly faster by a tiny margin~~
passing a bool instead of an int is actually slower 🤷‍♂️
manually dereferencing the interface using an if statement is considerably faster than just calling the interface
there doesn't seem to be any issue with heap allocations

go test -bench=. -test.run=none -benchmem -benchtime=1000000000x benchmark_test.go                                                               14s
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
Benchmark_LittleEndian_Direct_Uint64-8                          1000000000               0.2865 ns/op          0 B/op          0 allocs/op
Benchmark_Inlined_Function_LittleEndian_Uint64-8                1000000000               0.2828 ns/op          0 B/op          0 allocs/op
Benchmark_Function_LittleEndian_Uint64-8                        1000000000               1.434 ns/op           0 B/op          0 allocs/op
Benchmark_ByteOrder_Uint64-8                                    1000000000               3.009 ns/op           0 B/op          0 allocs/op
Benchmark_ByteOrder_Function_Uint64-8                           1000000000               2.812 ns/op           0 B/op          0 allocs/op
Benchmark_ByteOrder_ManualDereference_Uint64-8                  1000000000               1.773 ns/op           0 B/op          0 allocs/op
Benchmark_ByteOrder_ManualDereference_Integer_Uint64-8          1000000000               1.441 ns/op           0 B/op          0 allocs/op
Benchmark_ByteOrder_ManualDereference_Boolean_Uint64-8          1000000000               1.714 ns/op           0 B/op          0 allocs/op

package main

import (
    "encoding/binary"
    "testing"
)

var empty = make([]byte, 8)

// implementations

// note: this is the only implementation where inlining is allowed
func decodeInlined(bytes []byte) uint64 {
    return binary.LittleEndian.Uint64(bytes)
}

//go:noinline
func decode(bytes []byte) uint64 {
    return binary.LittleEndian.Uint64(bytes)
}

//go:noinline
func decodeWithInterface(order binary.ByteOrder, bytes []byte) uint64 {
    return order.Uint64(bytes)
}

//go:noinline
func decodeWithFunction(fn func([]byte) uint64, bytes []byte) uint64 {
    return fn(bytes)
}

//go:noinline
func decodeWithInterfaceManualDereference(order binary.ByteOrder, bytes []byte) uint64 {
    if order == binary.LittleEndian {
        return binary.LittleEndian.Uint64(bytes)
    }
    return binary.BigEndian.Uint64(bytes)
}

//go:noinline
func decodeWithInterfaceManualDereferenceInteger(order int, bytes []byte) uint64 {
    if order == 0 {
        return binary.LittleEndian.Uint64(bytes)
    }
    return binary.BigEndian.Uint64(bytes)
}

//go:noinline
func decodeWithInterfaceManualDereferenceBoolean(order bool, bytes []byte) uint64 {
    if order {
        return binary.LittleEndian.Uint64(bytes)
    }
    return binary.BigEndian.Uint64(bytes)
}

// benchmarks

func Benchmark_LittleEndian_Direct_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        binary.LittleEndian.Uint64(empty)
    }
}

func Benchmark_Inlined_Function_LittleEndian_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeInlined(empty)
    }
}

func Benchmark_Function_LittleEndian_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decode(empty)
    }
}

func Benchmark_ByteOrder_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeWithInterface(binary.LittleEndian, empty)
    }
}

func Benchmark_ByteOrder_Function_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeWithFunction(binary.LittleEndian.Uint64, empty)
    }
}

func Benchmark_ByteOrder_ManualDereference_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeWithInterfaceManualDereference(binary.LittleEndian, empty)
    }
}

func Benchmark_ByteOrder_ManualDereference_Integer_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeWithInterfaceManualDereferenceInteger(0, empty)
    }
}

func Benchmark_ByteOrder_ManualDereference_Boolean_Uint64(b *testing.B) {
    for n := 0; n < b.N; n++ {
        decodeWithInterfaceManualDereferenceBoolean(false, empty)
    }
}

paulmach / orb

wkb: simplify byte order code #72