Performance benchmarks and alternatives to map lookups

mishak87 commented 7 months ago

This is my follow up from r/golang Reddit I did some benchmarking.

Each unmarshal benchmark iteration parses all 5 values (4x 4-6 characters long and one <empty>) and one bad value (4 characters long) ie. abcd, bcdef, cdefg, defg, <empty> and zabc. Similar for each marshal benchmark iteration parses all 5 values.

Map lookup is significantly slower than loop for 5 items. Converting bytes to string and comparing is faster than comparing bytes using bytes.Equal because bytes.Equal converts both arguments to string anyway.

Array index is 30x faster than map lookup. Map lookup in MarshalText() and String() functions are even 40+% slower than UnmarshalText.

func BenchmarkMarshalText(b *testing.B) {
    for i := 0; i < b.N; i++ {
        _, _ = Abcd.MarshalText()
        _, _ = Bcdef.MarshalText()
        _, _ = Cdefg.MarshalText()
        _, _ = Defg.MarshalText()
        _, _ = UnknownType.MarshalText()
    }
}

func BenchmarkUnmarshalText(b *testing.B) {
    x := MarketType{}
    for i := 0; i < b.N; i++ {
        x.UnmarshalText([]byte("abcd"))
        x.UnmarshalText([]byte("bcdef"))
        x.UnmarshalText([]byte("cdefg"))
        x.UnmarshalText([]byte("defg"))
        x.UnmarshalText([]byte(""))
        x.UnmarshalText([]byte("zabc"))
    }
}

cpu: AMD Ryzen 9 7950X3D 16-Core Processor          
BenchmarkMarshalText-32                 19269684            62.31 ns/op        0 B/op          0 allocs/op
BenchmarkMarshalTextLookup-32           502871172            2.200 ns/op           0 B/op          0 allocs/op
BenchmarkMarshalTextLookupString-32     541060243            2.202 ns/op           0 B/op          0 allocs/op
BenchmarkString-32                      20310391            57.20 ns/op        0 B/op          0 allocs/op
BenchmarkStringLookupString-32          439474542            2.646 ns/op           0 B/op          0 allocs/op
BenchmarkStringLookupBytes-32           100000000           10.97 ns/op        0 B/op          0 allocs/op
BenchmarkStringSwitch-32                372506850            3.955 ns/op           0 B/op          0 allocs/op
BenchmarkStringSwitchValue-32           262102232            3.888 ns/op           0 B/op          0 allocs/op

BenchmarkUnmarshalText-32                           33958742            35.89 ns/op        0 B/op          0 allocs/op
BenchmarkUnmarshalTextLookup-32                     38068413            31.15 ns/op        0 B/op          0 allocs/op
BenchmarkUnmarshalTextLookupString-32               52166524            23.68 ns/op        0 B/op          0 allocs/op
BenchmarkUnmarshalTextLookupStringAndConvert-32     31179402            38.46 ns/op        0 B/op          0 allocs/op

nikolaydubina commented 7 months ago

hey, thanks!

I was able to repro 5x improvement in encoding.

however, I still don't get it how you speed up decoding with arrays. Can you share code/PR/Gist/repo how you do it?

so, far I don't see good way to switch to arrays based encoding/decoding without: 1) requiring users to specify values contigiously and starting from 0; and 2) adding reading numeric value from var/const declaration AST statement. So looks like this is no go.

Feel free to commander or add your PR like this, or leave comments in PR by 2024-04-09.

https://github.com/nikolaydubina/go-enum-encoding/pull/20

mishak87 commented 7 months ago

I was thinking about how I use enums. And all the time I use them to optimize two things: memory footprint and speed of comparisons.

In my code I needed only uint8 and fast encoding/decoding to database/json and text. So I am thinking about extending stringer into an enumer that adds that functionality and wraps in type Enum struct { v uint8 }.

Source code for stringer is interesting. Author decided arbitrary cutoff of more than 10 contiguous intervals when it uses map else it is comparing integer and using appropriate slice of string per interval. One string per interval optimization is used to cut down number of compiled strings to one

however, I still don't get it how you speed up decoding with arrays. Can you share code/PR/Gist/repo how you do it?


# pseudo code
var A, B, C = enum{}, enum{1}, enum{2}
var enumStrings = []string{"a", "b", "c"}
var enumAll = []enum{A, B, C}
// stringer uses single string and indexes; performance might be slightly slower due to double lookup
var enumStringerString = "ABC"
var enumStringerIndexes = []int{0, 1, 2, 3}

func (e enum) UnmarshalText(value []byte) error { // TODO optimize len(value) == 0 => set Undefined if enabled // TODO fast path len(value) > longest enum value => ErrBadValue // TODO optimize skip Undefined value and start from 1 // NOTE s := string(value); and for (...) { if allValues[i] == s ... } is slower than type casting on every iteration for (i := 0; i < len(allValues); i++) { // if enumStringerString[enumStringerIndexes[i], enumStringerIndexes[i+1]] == string(value) { if allValues[i] == string(value) { e = enumAll[i] return nil } } return ErrBadValue }



~~I have not tested how contiguous integers would perform against the map but there will be cutoff where hashing the key and getting it will be faster than N range checks and one for loop for that interval.~~
EDIT: There is no benefit of checking multiple ranges and would be better to always generate array of all values and array of all strings.

EDIT: I am not sure if it could be made significantly faster than `for` loop for small number of enum values. Maybe writing tree of if statements or switches checking specific bytes before comparing to a most likely value.

nikolaydubina commented 7 months ago

I could not reproduce this. in my benchmarks this solution (loop over array) leads to slightly faster encoding, but slower decoding.

name                    old time/op    new time/op    delta
MarshalText_Color-16      10.3ns ± 0%     7.5ns ± 0%   ~     (p=1.000 n=1+1)
UnmarshalText_Color-16    11.5ns ± 0%    13.8ns ± 0%   ~     (p=1.000 n=1+1)

name                    old alloc/op   new alloc/op   delta
MarshalText_Color-16       0.00B          0.00B        ~     (all equal)
UnmarshalText_Color-16     0.00B          0.00B        ~     (all equal)

name                    old allocs/op  new allocs/op  delta
MarshalText_Color-16        0.00           0.00        ~     (all equal)
UnmarshalText_Color-16      0.00           0.00        ~     (all equal)

from my previous experience dealing with encoding/decoding enums, for large enum sets (256 values) map is significantly better than loop. it literally becomes O(1) vs O(N).

nikolaydubina commented 7 months ago

let's keep it map for now. but of course feel free to open PR that beats benchmarks. ideally keep in mind that code should be minimal.

mishak87 commented 7 months ago

Just to clarify my point.

I could not reproduce this. in my benchmarks this solution (loop over array) leads to slightly faster encoding, but slower decoding.

The fastest by a 8-9ns margin for 5 valid values without other optimizations is BenchmarkUnmarshalTextLookupString-32. Using array []string not [][]byte and checking in for (x = 0; i < len(allStrings); i++) { if string(value) == allStrings[i] { return allEnums[i] } } .

Using for range compared to for (;;) is slow because it allocates a variable and copies value into it on every iteration. Internal map hash function can have few loops to compute hash so for small data it makes sense single array lookup on strings (constant type unlike []byte) is faster.

nikolaydubina / go-enum-encoding

Performance benchmarks and alternatives to map lookups #19