simdjson / simdjson-java

A Java version of simdjson, a high-performance JSON parser utilizing SIMD instructions
Apache License 2.0
240 stars 17 forks source link

The performance of DOM Parser and Schema-Based Parser. #52

Open ZhaiMo15 opened 1 month ago

ZhaiMo15 commented 1 month ago

I've been testing the performance of Simdjson recently. The basic test is similar to default test, using twitter.json, as below:

@Benchmark
    public int recordSimdjson() {
        Set<String> defaultUsers = new HashSet<>();
        TwitterRecord twitter = simdJsonParser.parse(buffer, buffer.length, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            UserRecord user = status.user();
            if (user.default_profile()) {
                defaultUsers.add(user.screen_name());
            }
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int JsonValueSimdjson() {
        JsonValue simdJsonValue = simdJsonParser.parse(buffer, buffer.length);
        Set<String> defaultUsers = new HashSet<>();
        Iterator<JsonValue> tweets = simdJsonValue.get("statuses").arrayIterator();
        while (tweets.hasNext()) {
            JsonValue tweet = tweets.next();
            JsonValue user = tweet.get("user");
            if (user.get("default_profile").asBoolean()) {
                defaultUsers.add(user.get("screen_name").asString());
            }
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int recordJackson() throws IOException {
        Set<String> defaultUsers = new HashSet<>();
        TwitterRecord twitter = objectMapper.readValue(buffer, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            UserRecord user = status.user();
            if (user.default_profile()) {
                defaultUsers.add(user.screen_name());
            }
        }
        return defaultUsers.size();
    }

    record UserRecord(boolean default_profile, String screen_name) {
    }

    record StatusRecord(UserRecord user) {
    }

    record TwitterRecord(List<StatusRecord> statuses) {
    }

What's different is I shrunk the size of statuses, default is 101, I tested 101, 51, and 1 respectively, the result is below: size 101:

image

size 51:

image

size 1:

image

What's more, I changed the depth of test, the default is 3 and I changed it to 2, as below:

@Benchmark
    public int recordSimdjson() {
        Set<Object> defaultUsers = new HashSet<>();
        TwitterRecord twitter = simdJsonParser.parse(buffer, buffer.length, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            long id = status.id();
            String text = status.text();
            defaultUsers.add(id);
            defaultUsers.add(text);
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int JsonValueSimdjson() {
        JsonValue simdJsonValue = simdJsonParser.parse(buffer, buffer.length);
        Set<Object> defaultUsers = new HashSet<>();
        Iterator<JsonValue> tweets = simdJsonValue.get("statuses").arrayIterator();
        while (tweets.hasNext()) {
            JsonValue tweet = tweets.next();
            JsonValue id = tweet.get("id");
            JsonValue text = tweet.get("text");
            defaultUsers.add(id.asLong());
            defaultUsers.add(text.asString());
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int recordJackson() throws IOException {
        Set<Object> defaultUsers = new HashSet<>();
        TwitterRecord twitter = objectMapper.readValue(buffer, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            long id = status.id();
            String text = status.text();
            defaultUsers.add(id);
            defaultUsers.add(text);
        }
        return defaultUsers.size();
    }

    record StatusRecord(long id, String text) {
    }

    record TwitterRecord(List<StatusRecord> statuses) {
    }

The results are: size 101:

image

size 51:

image

size 1:

image

Here are my questions:

  1. The performance of Simdjson is not always faster than jackson? The shorter the JSON, the worse of Simdjson? If my JSON is short, I'd better not use simdjson?
  2. DOM Parser vs Schema-Based Parser, the performance also depends on size of JSON? My first thought is Schema-Based is faster.
piotrrzysko commented 1 month ago

Would you mind sharing the hardware and Java version you used to run these benchmarks? The output from lscpu and java -version should be sufficient. Also, it'd be great if you could share the benchmarks on a branch (or as a github repo). This would help me run them myself and profile the code.

piotrrzysko commented 1 month ago

Also, what do you mean by:

What's more, I changed the depth of test, the default is 3 and I changed it to 2, as below: ?

Both attached snippets look the same.

ZhaiMo15 commented 4 weeks ago

Consider the JSON as a tree, the depth I meant is the depth of the node of the tree. For example,

// depth 1
{
   a: 1,
   b: 2
}

// depth 2
{
   a: {
           b: 1,
           c: 2
      }
}

The V2 benchmark visited the data of "statuses.user.screen_name"(depth 3) while V3 is "statuses.id"(depth 2).

Both attached snippets look the same.

I'm sorry, I paste the wrong snippets.

ZhaiMo15 commented 4 weeks ago

And I re-run the benchmarks in a more stable environment, the results are similar. My hardware:

image

And the java version:

image

The benchmarks I ran are in: https://github.com/ZhaiMo15/simdjson-java/tree/performanceTest

piotrrzysko commented 3 weeks ago

Thanks for the update. I run your benchmarks on my desktop using two versions of Java (18 and 21). I got the following results:

JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS

Benchmark                                  (fileName)   Mode  Cnt       Score      Error  Units
TwitterBenchmarkV2.JsonValueSimdjson    /twitter.json  thrpt    5    1272.131 ±   77.097  ops/s
TwitterBenchmarkV2.recordJackson        /twitter.json  thrpt    5     577.435 ±   14.169  ops/s
TwitterBenchmarkV2.recordSimdjson       /twitter.json  thrpt    5    1963.116 ±   44.895  ops/s
TwitterBenchmarkV3.JsonValueSimdjson    /twitter.json  thrpt    5    1198.611 ±   63.058  ops/s
TwitterBenchmarkV3.recordJackson        /twitter.json  thrpt    5     748.944 ±    6.980  ops/s
TwitterBenchmarkV3.recordSimdjson       /twitter.json  thrpt    5    1976.876 ±  241.907  ops/s

TwitterBenchmarkV2.JsonValueSimdjson  /twitter50.json  thrpt    5    2435.258 ±  194.432  ops/s
TwitterBenchmarkV2.recordJackson      /twitter50.json  thrpt    5    1125.062 ±    2.477  ops/s
TwitterBenchmarkV2.recordSimdjson     /twitter50.json  thrpt    5    3787.566 ±   51.370  ops/s
TwitterBenchmarkV3.JsonValueSimdjson  /twitter50.json  thrpt    5    2429.744 ±  153.990  ops/s
TwitterBenchmarkV3.recordJackson      /twitter50.json  thrpt    5    1427.326 ±    6.044  ops/s
TwitterBenchmarkV3.recordSimdjson     /twitter50.json  thrpt    5    3883.897 ±   34.568  ops/s

TwitterBenchmarkV2.JsonValueSimdjson   /twitter1.json  thrpt    5  175359.845 ± 1522.490  ops/s
TwitterBenchmarkV2.recordJackson       /twitter1.json  thrpt    5   69225.339 ±  628.405  ops/s
TwitterBenchmarkV2.recordSimdjson      /twitter1.json  thrpt    5   83146.423 ±  654.256  ops/s
TwitterBenchmarkV3.JsonValueSimdjson   /twitter1.json  thrpt    5  181205.520 ±  269.705  ops/s
TwitterBenchmarkV3.recordJackson       /twitter1.json  thrpt    5   96834.366 ±  268.782  ops/s
TwitterBenchmarkV3.recordSimdjson      /twitter1.json  thrpt    5  102403.918 ±  625.203  ops/s

JDK 18.0.2.1, OpenJDK 64-Bit Server VM, 18.0.2.1+1

Benchmark                                  (fileName)   Mode  Cnt       Score      Error  Units
TwitterBenchmarkV2.JsonValueSimdjson    /twitter.json  thrpt    5    1152.353 ±  258.483  ops/s
TwitterBenchmarkV2.recordJackson        /twitter.json  thrpt    5     565.322 ±   17.139  ops/s
TwitterBenchmarkV2.recordSimdjson       /twitter.json  thrpt    5    1759.905 ±   41.637  ops/s
TwitterBenchmarkV3.JsonValueSimdjson    /twitter.json  thrpt    5    1122.889 ±  316.755  ops/s
TwitterBenchmarkV3.recordJackson        /twitter.json  thrpt    5     716.739 ±    6.083  ops/s
TwitterBenchmarkV3.recordSimdjson       /twitter.json  thrpt    5    1824.830 ±   19.503  ops/s

TwitterBenchmarkV2.JsonValueSimdjson  /twitter50.json  thrpt    5    2338.579 ±   58.298  ops/s
TwitterBenchmarkV2.recordJackson      /twitter50.json  thrpt    5    1094.865 ±    2.898  ops/s
TwitterBenchmarkV2.recordSimdjson     /twitter50.json  thrpt    5    3333.782 ±   55.180  ops/s
TwitterBenchmarkV3.JsonValueSimdjson  /twitter50.json  thrpt    5    2243.374 ±    9.085  ops/s
TwitterBenchmarkV3.recordJackson      /twitter50.json  thrpt    5    1419.183 ±   17.172  ops/s
TwitterBenchmarkV3.recordSimdjson     /twitter50.json  thrpt    5    3475.266 ±  132.370  ops/s

TwitterBenchmarkV2.JsonValueSimdjson   /twitter1.json  thrpt    5  164348.941 ± 9737.617  ops/s
TwitterBenchmarkV2.recordJackson       /twitter1.json  thrpt    5   68143.603 ±  257.766  ops/s
TwitterBenchmarkV2.recordSimdjson      /twitter1.json  thrpt    5   81290.062 ± 1121.079  ops/s
TwitterBenchmarkV3.JsonValueSimdjson   /twitter1.json  thrpt    5  170856.785 ± 1570.948  ops/s
TwitterBenchmarkV3.recordJackson       /twitter1.json  thrpt    5   94823.108 ±  256.773  ops/s
TwitterBenchmarkV3.recordSimdjson      /twitter1.json  thrpt    5  105020.635 ± 1673.175  ops/s

In general, Java 21 usually performs better, which is not surprising. However, the problem you've described is still valid. Let me go through your questions to make sure we are on the same page:

The performance of Simdjson is not always faster than jackson? The shorter the JSON, the worse of Simdjson? If my JSON is short, I'd better not use simdjson?

In this question, you are referring to the poor performance of the schema-based parser in comparison to Jackson for shorter JSONs. Overall, in all the above cases, some version of simdjson beats Jackson.

DOM Parser vs Schema-Based Parser, the performance also depends on size of JSON? My first thought is Schema-Based is faster.

This question is strictly related to the previous one, as the schema-based parser again performs unexpectedly poorly.

If my interpretation of the benchmark results and your concerns is correct, then we can narrow down the problem to the performance of the schema-based parser. I've profiled it while running the TwitterBenchmarkV2 for twitter1.json. This is what I got:

image

The flamegraph clearly shows that Java reflection is the culprit of the poor performance. Simdjson clears its internal cache of resolved classes every time the parse method is called. After commenting out the line in which the cache is cleared, I got the following results:

Benchmark                                        (fileName)   Mode  Cnt       Score       Error  Units
TwitterBenchmarkV2.JsonValueSimdjson         /twitter1.json  thrpt    5  175317.529 ±  2304.856  ops/s
TwitterBenchmarkV2.JsonValueSimdjson:·async  /twitter1.json  thrpt              NaN                ---
TwitterBenchmarkV2.recordJackson             /twitter1.json  thrpt    5   67562.644 ±  1220.035  ops/s
TwitterBenchmarkV2.recordJackson:·async      /twitter1.json  thrpt              NaN                ---
TwitterBenchmarkV2.recordSimdjson            /twitter1.json  thrpt    5  265694.392 ± 11438.461  ops/s
TwitterBenchmarkV2.recordSimdjson:·async     /twitter1.json  thrpt              NaN                ---

I suggest that you comment out this line and rerun the benchmarks in your environment. This is not the ultimate solution, of course. I just want to make sure that we are on the same page and that you don't see any other unexpected disparities between the parsers in terms of performance.

ZhaiMo15 commented 3 weeks ago

I suggest that you comment out this line and rerun the benchmarks in your environment.

We are now on the same page! I rerun the benchmarks with commenting out the classResolver.reset(), the results are similar to you, as well as the flamegraph.

This is not the ultimate solution, of course.

BTW, is there a problem if commenting out the classResolver.reset() for good? Maybe clear cache in a different place?

piotrrzysko commented 3 weeks ago

If we comment it out without changing anything else, there can be a problem because the cache will grow infinitely. In some cases, this is acceptable because the cache can contain as many entries as there are classes in the application.

I'll need to think about it. Perhaps the cache needs to have a more sophisticated eviction policy (LRU?).

ZhaiMo15 commented 3 weeks ago

Additionally, consider a real situation instead of benchmark, I believe the classResolver is different when parsing different JSON. So comment out the classResolver.reset() can increase the score of benchmark, but if each JSON would be parsed once, simdjson is still not good enough when JSON is short?

ZhaiMo15 commented 3 weeks ago

Perhaps the cache needs to have a more sophisticated eviction policy (LRU?).

Perhaps the cache needs a fixed size(configurable)? Regardless of the eviction policy.

piotrrzysko commented 3 weeks ago

So comment out the classResolver.reset() can increase the score of benchmark, but if each JSON would be parsed once, simdjson is still not good enough when JSON is short?

This is the penalty for using reflection, so you would need to pay it regardless of which parser you use if the parser relies on it. I have an idea on how to replace reflection with an alternative approach, but it requires more research.

Also, I wonder how realistic this problem is. How many different schemas can you have in your system?

ZhaiMo15 commented 3 weeks ago

I have an idea on how to replace reflection with an alternative approach, but it requires more research.

Great! Looking forward to it.

I wonder how realistic this problem is.

TBH, I've no idea. IDK the JSON size distribution in real world. In my case, I do have some small JSON.

piotrrzysko commented 3 weeks ago

TBH, I've no idea. IDK the JSON size distribution in real world. In my case, I do have some small JSON.

This is understandable, but I wasn't asking about the size of the JSON. Your question was:

if each JSON would be parsed once, simdjson is still not good enough when JSON is short?

So, I assumed that you have a situation where there are many different types of JSON schemas, and every time you parse a JSON, you use a different schema. In such a scenario, the cache is useless because the parser cannot reuse the classes that are already in the cache. However, I cannot think of a scenario where you have, say, a million different schemas and use a different one each time. Is this your case?

ZhaiMo15 commented 3 weeks ago

I get your point. The number of different schemas should not be large. However,

<T> T walkDocument(byte[] padded, int len, Class<T> expectedType) {
        jsonIterator.init(padded, len);
        classResolver.reset();

I think in current code, the cache would be cleared even the expectedType is always same?

if each JSON would be parsed once, simdjson is still not good enough when JSON is short?

What I mean JSON is JSON strings (i.e byte[] buffer). If I have lots of JSON string(let's say 100 different JSON strings), but few schemas(let's say only 1). When I parse 100 JSON strings in sequence, even though schema is same, the cache would still be cleared 100 times? And this would meet the performance penalty above.

piotrrzysko commented 3 weeks ago

Yes, exactly. This is why I mentioned that replacing this simple eviction policy with a more sophisticated one could be a good improvement. The new policy would keep already resolved schemas between parse method calls.

ZhaiMo15 commented 3 weeks ago

Just for interest's sake,

The flamegraph clearly shows that Java reflection is the culprit of the poor performance.

Jackson also use reflection, right? Why doesn't it show the poor performance?

piotrrzysko commented 3 weeks ago

But, is there any benchmark in which Jackson beats simdjson? I thought that for smaller inputs they are on a par.

ZhaiMo15 commented 3 weeks ago

Before commenting out, yes(https://github.com/ZhaiMo15/simdjson-java/blob/performanceTest/src/jmh/java/org/performance/TwitterBenchmarkV4.java), otherwise no. So jackson also uses reflection and cache but the cache is better than current simdjson (as we discussed above)?

I thought that for smaller inputs they are on a par.

Do you mean in smaller inputs case, the percentage of parsing(compared to reflection) is small, even simdjson can speed up parsing, the total performance is slightly changed?

The flamegraph of jackson:

image

IDK which part is the refection.