simdjson / simdjson-java

A Java version of simdjson, a high-performance JSON parser utilizing SIMD instructions
Apache License 2.0
266 stars 21 forks source link

Question about SimdJsonParser and JsonValue. #36

Open ZhaiMo15 opened 9 months ago

ZhaiMo15 commented 9 months ago

I'm running code below:

private final SimdJsonParser simdJsonParser = new SimdJsonParser();
String str1 = "{\"a\": \"1\", \"b\": \"11\", \"c\": \"111\"}";
byte[] buffer1 = str1.getBytes();
JsonValue simdJsonValue1 = simdJsonParser.parse(buffer1, buffer1.length);
System.out.println("a = " + simdJsonValue1.get("a").toString() + 
                   ", b = " + simdJsonValue1.get("b").toString() + 
                   ", c = " + simdJsonValue1.get("c").toString());

String str2 = "{\"a\": \"2\", \"b\": \"22\", \"c\": \"222\"}";
byte[] buffer2 = str2.getBytes();
JsonValue simdJsonValue2 = simdJsonParser.parse(buffer2, buffer2.length);
System.out.println("a = " + simdJsonValue2.get("a").toString() + 
                   ", b = " + simdJsonValue2.get("b").toString() + 
                   ", c = " + simdJsonValue2.get("c").toString());

System.out.println("a = " + simdJsonValue1.get("a").toString() + 
                   ", b = " + simdJsonValue1.get("b").toString() + 
                   ", c = " + simdJsonValue1.get("c").toString());

And the output is

a = 1, b = 11, c = 111
a = 2, b = 22, c = 222
a = 2, b = 22, c = 222

It looks like all the JsonValue share the same buffer if they are parsed by same parser. What can I do to save the independent JsonValue for different buffer(JSON string)?

Plus, I don't think new SimdJsonParser for each JSON string is a good idea, cuz it costs performance and memory.

piotrrzysko commented 9 months ago

It looks like all the JsonValue share the same buffer if they are parsed by same parser. What can I do to save the independent JsonValue for different buffer(JSON string)?

This is a limitation/property of simdjson. Currently, the only option is to extract the necessary data from JsonValue and store it in some external data structure. Perhaps a solution for this would be what you described in #35. By the way, what is your use case? Why do you need to keep JsonValues between runs of parse?

Plus, I don't think new SimdJsonParser for each JSON string is a good idea, cuz it costs performance and memory.

Definitely not a good idea. An instance of SimdJsonParser is meant to be reused within a single thread.

ZhaiMo15 commented 9 months ago

Why do you need to keep JsonValues between runs of parse?

For example, I have a lots of json need to parse. Each json is passed by others which I cannot know the content of json until they are passed to me. I want to speed up the whole parse process, so I created a hashmap. In Jackson, the key is json itself and the value is Object. When the same json passed to me in second time, I don't need to parse that json, I just need to read the hashmap and use get to get what I want. When using simdjson, I want to do the same thing. Unfortunately, if I try to use JsonValue be the value of hashmap, since there is only one parser every json shared, the second time I get the same json, I will get wrong data from the hashmap. I don't know if I describe my case clearly. Here's an example: Let's say that I get three json in order: json_a, json_b and json_a. If I use json as the key and JsonValue as the value of hashmap. The first json_a and json_b is fine, but when I get value of the second json_a, the JsonValue I get is saved "data" of json_b.

Perhaps a solution for this would be what you described in https://github.com/simdjson/simdjson-java/issues/35

Indeed, that would be a solution. I just want to know if there exists a easier way to solve it instead of creating a new API.

piotrrzysko commented 5 months ago

Additional conversation regarding JsonValue: https://github.com/simdjson/simdjson-java/issues/35#issuecomment-1880463656.