jakartaee / jsonp-api

Jakarta JSON Processing
https://eclipse.org/ee4j/jsonp
Other
141 stars 60 forks source link

[api] Add support for "pre-encoded keys" ala Jacksons SerializedString #340

Open rbygrave opened 2 years ago

rbygrave commented 2 years ago

Related PR: https://github.com/eclipse-ee4j/jsonp/issues/344 Related PR in Parsson - https://github.com/eclipse-ee4j/parsson/pull/47

Summary

Create a JsonGenerator.Key

JsonProvider provider = ...;

JsonGenerator.Key firstNameKey = provider.createGeneratorKey("firstName");

Use a JsonGenerator.Key

JsonGenerator generator = ...;

// we can use the key rather than just a string key for performance reasons
// ... it's already escaped 
generator.writeKey(firstNameKey).write("foo");

// rather than string key 
generator.writeKey("lastName").write("foo");

I'd like to discuss adding the ability for JsonGenerator to support "pre-encoded" keys as a performance optimisation along the lines of Jacksons SerializedString .

That is, Jackson JsonGenerator has writeFieldName(SerializableString key) ... and I'd love to see that added to the jakarta json api.

SerializedString is the implementation of SerializableString which has pre-encoded the String value into bytes.

This allows the underlying generators to effectively do array copy of the already encoded bytes. https://github.com/FasterXML/jackson-core/blob/2.14/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java#L282

Background:

I have been doing JMH performance benchmarks around this area and with jackson core, the use of SerializableString for the "field names" aka "keys" is significant - how significant depends on the relative width of the keys vs values. For relatively wide keys and narrow values it is really significant.

More background, I'm working on avaje-jsonb which uses APT source code generation to do json binding. avaje-jsonb currently abstracts over jackson code and now jakarta json (so I'm doing direct comparison between parsson and jackson-core in this manor). Currently this is an important performance feature that I see with the jackson core adapter.

rbygrave commented 2 years ago

The following is a JMH benchmark writing to json comparing:

  1. avaje-jsonb using jackson core 2.12.5
  2. avaje-jsonb using jakarta.json / parrson (org.eclipse.parsson:jakarta.json:1.0.0)
  3. Jackson ObjectMapper 2.12.5
PropertyStrTenTest.toJson_jsonb_jackson                                             thrpt       1714.593          ops/ms
PropertyStrTenTest.toJson_jsonb_jakarta                                             thrpt        992.326          ops/ms
PropertyStrTenTest.toJson_objectMapper                                              thrpt       1602.596          ops/ms

I believe this benchmark is showing me that there are 2 performance techniques that jackson core makes available that are not available via json-p api. These techniques are used by ObjectMapper and also by avaje-jsonb (my library) when it uses jackson core.

The first technique is this issue, that the binding library (jackson objectmapper and avaje-jsonb) can provide a Jackson SerialisedString as the key instead of a plain String.

Adding this could help move the through put of PropertyStrTenTest.toJson_jsonb_jakarta (parrson) towards PropertyStrTenTest.toJson_jsonb_jackson(jackson-core). The comparison with PropertyStrTenTest.toJson_objectMapper helps confirm where this sits relative to Jackson ObjectMapper.

The results with gc profiling:

PropertyStrTenTest.toJson_jsonb_jackson                                             thrpt       1714.593          ops/ms
PropertyStrTenTest.toJson_jsonb_jackson:·gc.alloc.rate                              thrpt       1071.445          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.alloc.rate.norm                         thrpt        688.054            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Eden_Space                     thrpt       1071.163          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Eden_Space.norm                thrpt        687.873            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Survivor_Space                 thrpt          0.002          MB/sec
PropertyStrTenTest.toJson_jsonb_jackson:·gc.churn.G1_Survivor_Space.norm            thrpt          0.001            B/op
PropertyStrTenTest.toJson_jsonb_jackson:·gc.count                                   thrpt         38.000          counts
PropertyStrTenTest.toJson_jsonb_jackson:·gc.time                                    thrpt         19.000              ms
PropertyStrTenTest.toJson_jsonb_jakarta                                             thrpt        992.326          ops/ms
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.alloc.rate                              thrpt       1391.629          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.alloc.rate.norm                         thrpt       1544.121            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Eden_Space                     thrpt       1409.426          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Eden_Space.norm                thrpt       1563.869            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Survivor_Space                 thrpt          0.004          MB/sec
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.churn.G1_Survivor_Space.norm            thrpt          0.004            B/op
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.count                                   thrpt         50.000          counts
PropertyStrTenTest.toJson_jsonb_jakarta:·gc.time                                    thrpt         34.000              ms
PropertyStrTenTest.toJson_objectMapper                                              thrpt       1602.596          ops/ms
PropertyStrTenTest.toJson_objectMapper:·gc.alloc.rate                               thrpt       1013.111          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.alloc.rate.norm                          thrpt        696.055            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Eden_Space                      thrpt       1014.789          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Eden_Space.norm                 thrpt        697.208            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Survivor_Space                  thrpt          0.002          MB/sec
PropertyStrTenTest.toJson_objectMapper:·gc.churn.G1_Survivor_Space.norm             thrpt          0.001            B/op
PropertyStrTenTest.toJson_objectMapper:·gc.count                                    thrpt         36.000          counts
PropertyStrTenTest.toJson_objectMapper:·gc.time                                     thrpt         18.000              ms
rbygrave commented 2 years ago

Wider keys -> more benefit

Given the 2 models:

package org.example.jmh.model;

import io.avaje.jsonb.Json;

@Json
public record NarrowNamesRecord(
  String a,
  String b,
  String c,
  String d,
  String e
) {
}
package org.example.jmh.model;

import io.avaje.jsonb.Json;

@Json
public record WideNamesRecord(
  String firstNameProperty1,
  String lastNameProperty2,
  String anotherSimilarProperty3,
  String moreOrLessProperty4,
  String lastButNotLeastProperty5
) {
}

The following benchmark compares the relative throughput when comparing models with relatively narrow keys vs wider keys. We expect the benefit from the use of SerializedString to be more pronounced when the keys are relatively wider as the generator is able to skip more encoding of keys.

Benchmark                                          Mode  Cnt     Score   Error   Units
RecordBasicTest.toJson_narrowNames_jsonb_jackson  thrpt       2327.835          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta  thrpt       1425.564          ops/ms
RecordBasicTest.toJson_narrowNames_objectMapper   thrpt       2268.823          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jackson    thrpt       3585.764          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta    thrpt       1749.537          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper     thrpt       3069.284          ops/ms

Note the test data for this is:

    testDataWideNames = new WideNamesRecord("1", "2", "3", "4", "5");
    testDataNarrowNames = new NarrowNamesRecord("property1property1property1property1", "property2property2property2property2", "property3property3property3property3", "property4property4property4property4", "property5property5property5property5");
rbygrave commented 2 years ago

With a local prototype I can see a 17% through put improvement with the "prepared keys" in the wide key case. We see that in the 1800 to 2112 values below.

RecordBasicTest.toJson_wideNames_jsonb_jackson                thrpt       3694.105          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey      thrpt       1800.453          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey    thrpt       2112.591          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper                 thrpt       3352.952          ops/ms

With the narrow key case we expect very little difference, we want to make sure we don't get degraded performance and we see that:

RecordBasicTest.toJson_narrowNames_jsonb_jackson              thrpt       2420.908          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta_normalKey    thrpt       1443.192          ops/ms
RecordBasicTest.toJson_narrowNames_jsonb_jakarta_preparedKey  thrpt       1447.498          ops/ms
RecordBasicTest.toJson_narrowNames_objectMapper               thrpt       2263.234          ops/ms

This is still short of what avaje-jsonb is getting via using jackson-core, and short of jackson objectmapper.

rbygrave commented 2 years ago

With a local prototype, getting closer to the performance of jackson core. Note this has some additional performance adjustments to parsson internals as well (like removing Scope.IN_FIELD to reduce GC pressure).

Benchmark                                                     Mode  Cnt     Score   Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson               thrpt       3568.656          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey     thrpt       2334.342          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey   thrpt       2884.539          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey2  thrpt       3274.152          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper                thrpt       3551.022          ops/ms
lukasj commented 2 years ago

hm, I'm a bit confused as where does this belong to. Based on the prefix [api] I would think this belongs here but as you are referring to parsson, that would belong to the implementation. Does this need some change in jsonp-api? If not, I would move this back to parsson project...

rbygrave commented 2 years ago

Does this need some change in jsonp-api?

Yes it does and yes I think it is fine that the issue is here. Ultimately as I see it we will need to change parsson / the reference implementation as well but it's all good having this here.

I have made this a bit confusing. This is pretty much a performance orientated feature. As such I am looking to prove the concept somewhat before I push a PR for the api change (hence the bench marking etc). So yes, I'm making implementation changes at the moment to parsson so that I can:

What we probably could do at some point is create a related issue in parsson but I am happy to progress my work a little more first.

rbygrave commented 2 years ago

Ok, I think I've done enough initial bench marking to push for this API change.

With respect to this API change we are looking at comparing the _jakartanormalKey and _jakartapreparedKey throughput. We see better throughput can be achieved by using keys that are "prepared" (already escaped and encoded).

These benchmarks include a baseline "jsonb_jackson" which is avaje-jsonb using Jackson core 12.13.0, and a baseline "objectMapper" which is Jackson ObjectMapper.

Note: Using a modified version of parsson for the jakarta implementation. Changes include using ThreadLocal based buffer recycling


4 threads, 

Benchmark                                                    Mode  Cnt      Score   Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    2  12847.776          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    2  12063.079          ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    2  18269.082          ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    2  11494.361          ops/ms

2 threads

Benchmark                                                    Mode  Cnt      Score      Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    4   6683.975 ± 3575.546  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    4   6882.481 ±  827.704  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    4  10105.276 ± 1312.702  ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    4   6577.431 ±  784.614  ops/ms

1 thread

Benchmark                                                    Mode  Cnt     Score     Error   Units
RecordBasicTest.toJson_wideNames_jsonb_jackson              thrpt    4  3608.207 ± 525.817  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_normalKey    thrpt    4  3471.075 ± 396.187  ops/ms
RecordBasicTest.toJson_wideNames_jsonb_jakarta_preparedKey  thrpt    4  5248.490 ± 682.141  ops/ms
RecordBasicTest.toJson_wideNames_objectMapper               thrpt    4  3261.331 ± 272.124  ops/ms