minio / simdjson-go

Golang port of simdjson: parsing gigabytes of JSON per second
Apache License 2.0
1.8k stars 85 forks source link

Benchmarks are misleading #38

Closed mhr3 closed 3 years ago

mhr3 commented 3 years ago

The benchmarks in benchmarks_test.go aren't really comparing apples to apples, the simdjson benchmark is only parsing the JSON, while the other benchmarks are parsing it and building an interface{}-typed object to represent the data.

harshavardhana commented 3 years ago

You need to first understand the purpose of this library, it is not a drop in replacement of JSON parsing.

It is meant to introduce a newer technique for JSON parsing that is meant for specific streaming use cases.

The parsed json is implemented as a generic struct that allows for streaming approach to handle the large JSON blobs to be parsed.

This approach is essentially then SIMD optimized it may not work for all use cases.

The benchmarks kept are showing the exact difference of parsing, providing an interface structure makes json Marshal not do any reflection so in essence they both are doing the same thing.

klauspost commented 3 years ago

@mhr3 As stated on the README

Though simdjson provides different output than traditional unmarshal functions this can give an overview of the expected performance for reading specific data in JSON.

Below is a performance comparison to Golang's standard package encoding/json based on the same set of JSON test files, unmarshal to interface{}.

mhr3 commented 3 years ago

You need to first understand the purpose of this library, it is not a drop in replacement of JSON parsing.

That's exactly my point, the current benchmarks make it seem like simdjson is magic that does everything 10x faster with literally 20 memory allocations (even though any string in golang will cause an alloc).

Below is a performance comparison to Golang's standard package encoding/json based on the same set of JSON test files, unmarshal to interface{}.

Yet the unmarshal to interface{} is pretty expensive and the simdjson benchmark isn't doing it , the benchmarks would be completely accurate if the simdjson code was also constructing the interface{}, which is literally two lines of code:

iter := parsedJson.Iter()
iter.Interface()
klauspost commented 3 years ago

@mhr3 This is what is needed to unmarshal arbitrary JSON and inspect values. Creating an interface representation is pointless if you need to look up specific values, say claims in a JWT (token).

It may not cover your specific needs, but you are welcome to do your own benchmarks.

ernado commented 2 years ago

I've implemented more realistic benchmark for jsoniter, simulating two cases: validation and recursive arbitrary json parsing:

BenchmarkJsoniterApache_builds/Recursive-32  2908   376577 ns/op   337.98 MB/s   88739 B/op  5297 allocs/op
BenchmarkJsoniterApache_builds/Validate-32   6484   181069 ns/op   702.91 MB/s   15680 B/op  2658 allocs/op
BenchmarkApache_builds/copy-32               8259   147774 ns/op   861.28 MB/s   1009 B/op   23 allocs/op
BenchmarkApache_builds/nocopy-32             9049   141519 ns/op   899.35 MB/s   985 B/op    23 allocs/op

So jsoniter should be something between Validate and Recursive while looking up for specific values. Most allocations are due to using String() insead of allocation-friendly slice version:

// iterRecursive simulates recursive read of object by jsoniter.
func iterRecursive(i *jsoniter.Iterator) bool {
    switch i.WhatIsNext() {
    case jsoniter.ObjectValue:
        return i.ReadMapCB(func(i *jsoniter.Iterator, s string) bool {
            _ = s
            return iterRecursive(i)
        })
    case jsoniter.ArrayValue:
        return i.ReadArrayCB(func(i *jsoniter.Iterator) bool {
            return iterRecursive(i)
        })
    case jsoniter.NumberValue:
        _ = i.ReadNumber()
        return true
    case jsoniter.StringValue:
        // ReadStringAsSlice cannot be used due to required escaping.
        _ = i.ReadString()
        return true
    case jsoniter.NilValue:
        return i.ReadNil()
    default:
        i.Skip()
        return true
    }
}

So jsoniter is pretty fast and probably could be even faster with proper implementation of ReadStringAsSlice, 10/30X speed improvement claims are not very accurate.

klauspost commented 2 years ago

@ernado You are very welcome to send in a PR with the comparable benchmarks so we can adjust the docs.