jjenkov / parsers-in-java

A JSON parser implemented in Java, to show how to implement high performance parsers in Java.
107 stars 31 forks source link

Turn DataBuffer into an interface, have an impl that takes a String #2

Open slandelle opened 10 years ago

slandelle commented 10 years ago

I've tried to optimize things (smaller inlinable methods, switches instead of if/elseif, cached parser instances).

So far, the only thing that did performed better was turning DataCharBuffer into an interface, and add an implementation that wraps a String instead of a char array, thus saving an array copy.

This improved performance on big JSON messages of ~15%.

I'm still puzzled about your implementation being beaten by gson and json-smart on smaller messages. I suspect IndexBuffer's arrays allocation.

jjenkov commented 10 years ago

That could actually be a reason - because an IndexBuffer allocates 3 arrays internally. With 2 IndexBuffers in use, that means 8 objects allocated (2 x IndexBuffer + 6 arrays). That might be a big overhead on small JSON strings.

There is another small performance optimization that can be done. I think the parseNumberToken() method starts scanning from the first character in the number. But we already know that the first character is a number character, otherwise parseNumberToken() would not have been called. That means that tokenLength could be initialized to 1 instead of 0 inside parseNumberToken, thus saving 1 character comparison per number (unless I am mistaken). It's not a lot, but if you want to fine tune the parser, all stones must be turned :-)

jjenkov commented 10 years ago

Just added the tokenLength optimization in the parseNumber() method. I will play around with a few other optimizations this weekend.

slandelle commented 10 years ago

Yeah, I already added that one on my copy.

I realized that my test on my biggest sample was flawed because your parser doesn't support booleans for now. I think having growable arrays will mostly improve performance for big json payloads.

Still not sure about why performance for small ones is not that good.

jjenkov commented 10 years ago

I've got support for booleans added now.

jjenkov commented 10 years ago

I've got several performance optimizations into the code now. I have also implemented a JsonParser2 which merges the tokenization and parsing phase into a single phase and class. That is 25% faster than when the two phases are split in two. The JsonParser2 beats GSON streaming by a factor 2,5 to 5 depending on the file size (smaller files are faster).