Support closing of underlying parser (important performance aspect for Jackson backend) automatically or explicitly

cowtowncoder commented 3 years ago

Looking at benchmark I noticed that Jackson-backed surfer seemed to be little bit slower than GSON or FastJSON one. While this could happen in some cases (different libraries have different trade-offs) I was bit surprised compared to my own tests of parsing speed, and decided to try bit of benchmarking using Java async profiler (https://github.com/jvm-profiling-tools/async-profiler).

What I found out strongly suggests that the underlying Jackson JsonParser does not get close()d after being used, nor is all the content read (when reaching end of content, parser is auto-closed, basically, working as equivalent to close() call). This unfortunately has significant performance overhead -- Jackson uses aggressive recycling of buffers and symbol table. As a result, I think performance of Jackson-backed version may be as much as 50% lower as otherwise.

I hope to have time to see if there was a way to augment internal APIs to call close() on at least some of the cases. I realize that due to life-cycle it is possible that this may not be trivial to do, especially on all cases. But I hope to at least show the performance difference if this could be done.

It is also possible that other parser backend could have similar optimizations but I am not as familiar with them.

wanglingsong commented 3 years ago

Thanks so much for the finding. The SurfingContext does have a stopped status. I can try to add handling to call close() when SurfingContext stopped.

wanglingsong commented 3 years ago

I created new branch for this change. Please feel free to test it in your benchmark

wanglingsong commented 3 years ago

I ran my benchmark test. It looks like working.

cowtowncoder commented 3 years ago

Thank you, this was REALLY fast! I hope to re-run my tests, and should be able to see a change.

cowtowncoder commented 3 years ago

Interesting. I can see performance benefits for both Jackson and fast-json for "collect single value" case (no change for gson); that makes sense. For Jackson almost 2x as high, for fast-json 10-20%.

I did not see much change for "large json" but that might be since overhead of not closing is relatively smaller.

I hope to have time to run async-profiler for both cases too, to see if large json cases is working as expected. Since input is 10 meg json document overhead should matter way less than for smaller docs. But I am bit surprised that gson would be faster than jackson or fast-json there.

This seems like a really good progress!

cowtowncoder commented 3 years ago

Ok, I can see that cpu and allocation has improved a lot for Jackson, "collect single case".

For CPU we have

        ns  percent  samples  top
  ----------  -------  -------  ---
  6030000000   21.29%      603  com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken
  2260000000    7.98%      226  java.util.AbstractCollection.isEmpty
  1960000000    6.92%      196  SpinPause
  1860000000    6.57%      186  org.jsfr.json.path.PathOperator.match
  1760000000    6.21%      176  com.fasterxml.jackson.core.JsonFactory._createContext
  1370000000    4.84%      137  org.jsfr.json.SurfingContext.startJSON
  1290000000    4.56%      129  org.jsfr.json.JsonPosition.stepIntoArray
  1250000000    4.41%      125  com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipComma
  1020000000    3.60%      102  com.fasterxml.jackson.databind.node.BaseJsonNode.<init>
   980000000    3.46%       98  __psynch_cvwait
   870000000    3.07%       87  org.jsfr.json.JacksonParser$JacksonResumableParser.doPare
   800000000    2.82%       80  com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd
   720000000    2.54%       72  jshort_disjoint_arraycopy

of which everything looks ok, except entry #2 -- why would collection.isEmpty() be there? Could be just some profiling oddity. It gets called like so, for what that is worth:

--- 1080000000 ns (3.81%), 108 samples
  [ 0] java.util.AbstractCollection.isEmpty
  [ 1] org.jsfr.json.ContentDispatcher.startObjectEntry
  [ 2] org.jsfr.json.SurfingContext.startObjectEntry
  [ 3] org.jsfr.json.JacksonParser$JacksonResumableParser.doPare
  [ 4] org.jsfr.json.JacksonParser$JacksonResumableParser.parse
  [ 5] org.jsfr.json.JacksonParser.parse
 [ 6] org.jsfr.json.JsonSurfer.surf
  [ 7] org.jsfr.json.BenchmarkCollectSingleValue.benchmarkJacksonWithJsonSurfer

And for allocation Jackson one shows

      bytes  percent  samples  top
  ----------  -------  -------  ---
 15307819040   23.45%     7047  com.fasterxml.jackson.core.json.ReaderBasedJsonParser
 13359998744   20.47%     6252  com.fasterxml.jackson.core.json.JsonReadContext
  4190948160    6.42%     1953  com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer
  3896346976    5.97%     1821  org.jsfr.json.path.PathOperator[]
  3525110328    5.40%     1657  com.fasterxml.jackson.core.io.IOContext
  3435440920    5.26%     1603  org.jsfr.json.JacksonParser$JacksonResumableParser$2
  2397000176    3.67%     1111  com.fasterxml.jackson.core.util.TextBuffer
  2346428840    3.60%     1075  org.jsfr.json.path.ChildNode
  1949030488    2.99%      909  org.jsfr.json.JacksonParser$JacksonResumableParser$1
  1794081760    2.75%      851  org.jsfr.json.JacksonParser$JacksonResumableParser
  1720317856    2.64%      788  org.jsfr.json.JacksonParser$JacksonResumableParser$3
  1665854816    2.55%      777  char[]
  1663302360    2.55%      767  java.lang.String
  1628522496    2.50%      725  org.jsfr.json.SurfingContext
  1566206304    2.40%      758  java.util.LinkedList
  1441881712    2.21%      661  org.jsfr.json.JsonPosition
  1014749728    1.55%      501  org.jsfr.json.ContentDispatcher
   897226688    1.37%      412  org.jsfr.json.path.ArrayIndex
   776587752    1.19%      367  com.fasterxml.jackson.databind.node.TextNode
   688733480    1.06%      334  org.jsfr.json.StaticPrimitiveHolder

which makes sense to me, no big surprises.

cowtowncoder commented 3 years ago

Ok I figured out the problem with BenchmarkParseLargeJson wrt Jackson, fast-json. Document size itself (10 megabytes) is fine, but unfortunately it has tens of thousands of distinct property names (one of the field names is description, hence unique). In such cases use of symbol table can be less efficient: as table must be bounded to avoid running out of memory, there will not be much possibility of reuse and with enough symbols a lot of time is used in managing symbol table itself.

For Jackson, it is possible to disable name canonicaliation, with

 jsonFactory.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES);

and doing that for the specific test will actually increase performance to be same as Gson. Same may well be true with fast-json.

However -- for most json content using canonicalization is a performance win so I would not recommend changing the defaults. Canonicalization is most useful (in case of Jackson) with byte based input (InputStream, byte[]), and somewhat less so for char based input (Reader, String). So for high-performance reads via InputStream, for example, one rarely wants to disable it. For huge documents with distinct names, reading from String, there can be benefit. As long as users have a way for customizing underlying factories -- which is true for JsonSurfer where user can construct a custom JacksonProvider -- it is possible to work around issues.

But I am not sure if there was any good way to document this.

Anyway: aside from properly closing underlying parser -- which appears to work well! -- I don't see immediate other improvements.

wanglingsong commented 3 years ago

Ok, I can see that cpu and allocation has improved a lot for Jackson, "collect single case".

For CPU we have

        ns  percent  samples  top
  ----------  -------  -------  ---
  6030000000   21.29%      603  com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken
  2260000000    7.98%      226  java.util.AbstractCollection.isEmpty
  1960000000    6.92%      196  SpinPause
  1860000000    6.57%      186  org.jsfr.json.path.PathOperator.match
  1760000000    6.21%      176  com.fasterxml.jackson.core.JsonFactory._createContext
  1370000000    4.84%      137  org.jsfr.json.SurfingContext.startJSON
  1290000000    4.56%      129  org.jsfr.json.JsonPosition.stepIntoArray
  1250000000    4.41%      125  com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipComma
  1020000000    3.60%      102  com.fasterxml.jackson.databind.node.BaseJsonNode.<init>
   980000000    3.46%       98  __psynch_cvwait
   870000000    3.07%       87  org.jsfr.json.JacksonParser$JacksonResumableParser.doPare
   800000000    2.82%       80  com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd
   720000000    2.54%       72  jshort_disjoint_arraycopy

of which everything looks ok, except entry #2 -- why would collection.isEmpty() be there? Could be just some profiling oddity. It gets called like so, for what that is worth:

--- 1080000000 ns (3.81%), 108 samples
  [ 0] java.util.AbstractCollection.isEmpty
  [ 1] org.jsfr.json.ContentDispatcher.startObjectEntry
  [ 2] org.jsfr.json.SurfingContext.startObjectEntry
  [ 3] org.jsfr.json.JacksonParser$JacksonResumableParser.doPare
  [ 4] org.jsfr.json.JacksonParser$JacksonResumableParser.parse
  [ 5] org.jsfr.json.JacksonParser.parse
 [ 6] org.jsfr.json.JsonSurfer.surf
  [ 7] org.jsfr.json.BenchmarkCollectSingleValue.benchmarkJacksonWithJsonSurfer

And for allocation Jackson one shows

      bytes  percent  samples  top
  ----------  -------  -------  ---
 15307819040   23.45%     7047  com.fasterxml.jackson.core.json.ReaderBasedJsonParser
 13359998744   20.47%     6252  com.fasterxml.jackson.core.json.JsonReadContext
  4190948160    6.42%     1953  com.fasterxml.jackson.core.sym.CharsToNameCanonicalizer
  3896346976    5.97%     1821  org.jsfr.json.path.PathOperator[]
  3525110328    5.40%     1657  com.fasterxml.jackson.core.io.IOContext
  3435440920    5.26%     1603  org.jsfr.json.JacksonParser$JacksonResumableParser$2
  2397000176    3.67%     1111  com.fasterxml.jackson.core.util.TextBuffer
  2346428840    3.60%     1075  org.jsfr.json.path.ChildNode
  1949030488    2.99%      909  org.jsfr.json.JacksonParser$JacksonResumableParser$1
  1794081760    2.75%      851  org.jsfr.json.JacksonParser$JacksonResumableParser
  1720317856    2.64%      788  org.jsfr.json.JacksonParser$JacksonResumableParser$3
  1665854816    2.55%      777  char[]
  1663302360    2.55%      767  java.lang.String
  1628522496    2.50%      725  org.jsfr.json.SurfingContext
  1566206304    2.40%      758  java.util.LinkedList
  1441881712    2.21%      661  org.jsfr.json.JsonPosition
  1014749728    1.55%      501  org.jsfr.json.ContentDispatcher
   897226688    1.37%      412  org.jsfr.json.path.ArrayIndex
   776587752    1.19%      367  com.fasterxml.jackson.databind.node.TextNode
   688733480    1.06%      334  org.jsfr.json.StaticPrimitiveHolder

which makes sense to me, no big surprises.

It is a necessary checking for the '''collection.isEmpty()''', so nothing else to fix here.

wanglingsong commented 3 years ago

Ok I figured out the problem with BenchmarkParseLargeJson wrt Jackson, fast-json. Document size itself (10 megabytes) is fine, but unfortunately it has tens of thousands of distinct property names (one of the field names is description, hence unique). In such cases use of symbol table can be less efficient: as table must be bounded to avoid running out of memory, there will not be much possibility of reuse and with enough symbols a lot of time is used in managing symbol table itself.

For Jackson, it is possible to disable name canonicaliation, with
 jsonFactory.disable(JsonFactory.Feature.CANONICALIZE_FIELD_NAMES);
and doing that for the specific test will actually increase performance to be same as Gson. Same may well be true with fast-json.

However -- for most json content using canonicalization is a performance win so I would not recommend changing the defaults. Canonicalization is most useful (in case of Jackson) with byte based input (InputStream, byte[]), and somewhat less so for char based input (Reader, String). So for high-performance reads via InputStream, for example, one rarely wants to disable it. For huge documents with distinct names, reading from String, there can be benefit. As long as users have a way for customizing underlying factories -- which is true for JsonSurfer where user can construct a custom JacksonProvider -- it is possible to work around issues.

But I am not sure if there was any good way to document this.

Anyway: aside from properly closing underlying parser -- which appears to work well! -- I don't see immediate other improvements.

Maybe a separate page named like "JsonSurfer Jackson Tips" can be added and linked from the main readme. Contribution is always welcomed.

wanglingsong commented 3 years ago

Anyway I will merge this change to master branch and make it available in next release

cowtowncoder commented 3 years ago

@wanglingsong sounds good. Looking forward to the next release -- thank you for the very quick follow up here!

cowtowncoder commented 3 years ago

Hi @wanglingsong! I think this is solved and can be closed. Would it be possible to get a new version released now? I would like to upgrade our usage of this great library.

wanglingsong commented 3 years ago

Sure. I just made a release and uploaded the artifacts onto maven central. It should be available several hours later. Btw, may I know in what project do you use JsonSurfer? @cowtowncoder

cowtowncoder commented 3 years ago

@wanglingsong Thank you!

Project is "Stargate":

https://github.com/stargate/stargate

wanglingsong / JsonSurfer

Support closing of underlying parser (important performance aspect for Jackson backend) automatically or explicitly #70