ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.86k stars 295 forks source link

Slower than encoding/json #107

Closed atombender closed 9 years ago

atombender commented 9 years ago

Here's my test:

n := 5

log.Print("encoding/json")
for i := 0; i < n; i++ {
  t := time.Now()
  var v interface{}
  for j := 0; j < 20; j++ {
    if err = json.Unmarshal(b, &v); err != nil {
      panic(err)
    }
  }
  log.Print(time.Now().Sub(t))
}

log.Print("go-codec")
for i := 0; i < n; i++ {
  t := time.Now()
  for j := 0; j < 20; j++ {
    var dec *codec.Decoder = codec.NewDecoderBytes(b, new(codec.JsonHandle))
    var v interface{}
    if err := dec.Decode(&v); err != nil {
      panic(err)
    }
  }
  log.Print(time.Now().Sub(t))
}

For a large, minified file with long keys (4.7MB), go-codec is ~ 31% slower than encoding/json:

encoding/json
1.617601942s
1.576523967s
1.610622356s
1.608594081s
1.647141938s
go-codec
2.129585437s
2.166288614s
2.1520013s
2.061147463s
2.102718168s

For a file containing the same values, but with every key shortened to just 2 characters (970KB), it's 80% slower:

encoding/json
474.933239ms
451.410297ms
452.822768ms
462.508272ms
456.032676ms
go-codec
852.508337ms
849.04073ms
830.897711ms
852.40816ms
826.848954ms

This is from byte arrays into interface{} values. Is there anything I can tweak to make go-codec be faster? Is this by design, somehow?

Go 1.5.1 on OS X.

ugorji commented 9 years ago

Please attach the json file and the full source code for running it, so I can just run it with a "go run file.go" style command, and investigate.

Thanks.

atombender commented 9 years ago

Here.

ugorji commented 9 years ago

encoding/json will decode into a map[string]interface{}.

You need to tell the JsonHandle that you want that also. This will also give faster performance as decoding into a typed value is way faster than decoding into an interface{} value.

Also, encoding/json would not retrieve the previous value that a key is mapped into, and decode into that. By default, codec does. Consequently, if you have a structure with a lot of state, and the json file only has 2 fields, you will "update" the value previously in the map.

Understandably, folks may want it to just decode into a "blank" value like encoding/json does. I will introduce a flag to handle that.

In addition, codec's JsonHandle was not fully re-using a slice for all string and number decodings. It would re-use an array, which is limited to 64 length. Consequently, if some strings were constantly over 64, we kept doing new allocation. I will fix this to share a single slice, which may increase a lot, but which is bound to a Decoder instance.

Taken together, these reduce allocations and improved perf, giving you close to parity with encoding/json in this use-case.

Note that this is a simple use-case, and kinda sidesteps all the performance things which go-codec does. go-codec shines when many types come into play e.g. encoding/decoding a struct, or a collection containing pointers to structs, and when many different collections are included in the type. For this sample, you are basically doing blind "naked" decoding into interface{}, so everything is dynamic and there's not much to do. For pure "naked" interface conversions containing only built-in types, the performance may skew towards encoding/json for smaller files. However, it will skew towards go-codec as file gets bigger, and will skew tremendously when using specific types.

atombender commented 9 years ago

Thanks. Your patch brings codec a bit closer to encoding/json.

I'm not seeing any performance difference between using interface{} and map[string]interface{}. I am also seeing about 10% worse performance when reusing the same out variable across all parse invocations.

Anyway, my actual use case would benefit from reusing the data only if codec could intersected (as opposed to union) the struct, e.g. {"a":1, "b":2} + {"a":3} should result in {"a":3}.

An aside, it looks to me (I haven't studied the code closely) like neither encoding/json nor codec will intern keys or values. If you have a file containing an array of one million {"somekey": "blah"}, then "somekey" and "blah" will be allocated one million times each. Even when streaming, this might be more costly than simply reusing strings, especially for long ones. Any thoughts?

ugorji commented 9 years ago

I just uploaded the fix.

On my machine, I ran my benchmarks and also ran your benchmarks.

https://gist.github.com/ugorji/0972bb21444609fec896 This is your benchmark. I put it in a directory called github_107, put test1.json and test2.json in there, and ran the command:

    GOGC=off GOMAXPROCS=1 go test -bench '.'  -benchmem -benchtime=8s

The results are below:

Benchmark__GoCodec1      100      87976099 ns/op    15500403 B/op     310415 allocs/op
Benchmark__StdJson1      100      92563416 ns/op    15689660 B/op     275697 allocs/op
Benchmark__GoCodec2      300      32957248 ns/op     6275992 B/op     155052 allocs/op
Benchmark__StdJson2      500      25489879 ns/op     5873664 B/op     130051 allocs/op

These performance difference here could be attributed to the fact that go-codec supports multiple formats, and extensively uses interfaces for that. interfaces mean that some opportunities for inlining, etc are lost, and there's an indirect call possible.

When I ran my full extensive benchmarks, I got the results below:

./bench.sh -ig -x '_Json'
Benchmark__Json_______Encode       20000         41135 ns/op        4032 B/op         35 allocs/op
Benchmark__Std_Json___Encode       20000         48981 ns/op       13848 B/op         97 allocs/op
Benchmark__Json_______Decode       10000         92372 ns/op       17048 B/op        410 allocs/op
Benchmark__Std_Json___Decode        5000        160112 ns/op       16120 B/op        485 allocs/op

You can see the tremendous performance improvement when decoding into typed structures.

Couple this with codecgen (code generation support), extensions, multiple formats, etc.

atombender commented 9 years ago

Yes, codec seems very nice for typed structures. In my use case, the structure is only known at runtime, as the format is defined by a runtime-editable schema. Would it be possible to have codec generate a parser at runtime from a programmatically built schema? I could have my app invoke the Go compiler to build a shared library, but I'd rather keep it a bit simpler than that.

ugorji commented 9 years ago

Got you.

Since we don't "generate" a parser per-say, codec cannot do this without major architectural changes.

BTW, the Interning idea is interesting. It will have to be yet another option, as interning will generally cause slowdown in parsing (due to the map lookups necessary using string keys), but it is possible.

ugorji commented 9 years ago

I just prototyped interning, and tested it using your code.

The performance and allocation got much worse, as you have 80K and 30K keys respectively.

However, I could see a use-case which has a set number of keys, and it would get performance improvements due to the reduced amount of allocation.

The cost is an extra "if map is not nil" check.

atombender commented 9 years ago

The data I sent you is actually scrambled randomly, since I couldn't send you the (sensitive) original data. I can send you a file containing consistent keys to try out on.

On Oct 13, 2015, at 16:48, Ugorji Nwoke notifications@github.com wrote:

I just prototyped interning, and tested it using your code.

The performance and allocation got much worse, as you have 80K and 30K keys respectively.

However, I could see a use-case which has a set number of keys, and it would get performance improvements due to the reduced amount of allocation.

The cost is an extra "if map is not nil" check.

— Reply to this email directly or view it on GitHub.

ugorji commented 9 years ago

@atombender

4373325df6aa9e75b5d0ee6ef96115cf5c2759d1 contains the interning support.

atombender commented 9 years ago

Very cool. Interning doesn't seem to add any overhead here on my files, actually.

ugorji commented 9 years ago

GOMAXPROCS=1 go test -bench '.*dec1' -benchmem -benchtime=8s

With InternString=false Benchmark__GoCodec1 30 303965811 ns/op 15501412 B/op 310420 allocs/op

With InternString=true Benchmark__GoCodec1 20 429471588 ns/op 25833486 B/op 314382 allocs/op

Number of entries in intern map: 87892

The performance drop is significant (when running against test1.json) from 303ms to 429ms. This is strictly caused by the map lookup, and addition.

atombender commented 9 years ago

Try these files. I have made the keys consistently randomized instead of completely scrambled.

ugorji commented 9 years ago

You sent me the same files ;)

atombender commented 9 years ago

Look more closely. :) It’s not quite the same contents.

ugorji commented 9 years ago

I'm tripping ;) I unpacked the old file after downloading.

New results. With interning, there's clear reduction in allocation, but slight increase in cpu-time. Users that like that tradeoff can set the flag, as the behaviour will depend on the number of strings, etc.

InternString=true Benchmark__GoCodec1 30 319176829 ns/op 12747389 B/op 221582 allocs/op

InternString=false Benchmark__GoCodec1 30 300276244 ns/op 15489700 B/op 310379 allocs/op

Nice call. I like the feature.

atombender commented 9 years ago

Superb!