quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.58k stars 2.63k forks source link

Jakarta JSON-P `Json.***` methods are crazily inefficient #42748

Closed gsmet closed 3 weeks ago

gsmet commented 3 weeks ago

The low-level static method hosted in the Jakarta JSON-P Json object are very inefficient: they are resolving a JsonProvider.provider() for each call which in turn calls the ServiceLoader (among other things...) and allocate a new JsonProvider.

See:

Meaning that when you are calling Json.createValue("string") to allocate a simple String value, you are actually making a call to the service loader and instantiating a new JsonProvider.

I stumbled upon that independently this week-end and after experimenting with it and writing a quick patch, I was like "it's crazy nobody complained about it" so I went to the JSON-P website and stumbled upon:

Meaning that I don't expect it to be fixed any time soon.

To give an idea of how things go, I did a simple JMH benchmark.

Json.createValue("test") i.e. what people (including us) use:

Iteration   1: 153.907 ops/ms
Iteration   2: 155.308 ops/ms
Iteration   3: 154.542 ops/ms
Iteration   4: 155.963 ops/ms
Iteration   5: 156.462 ops/ms

Using a static provider with JSON_PROVIDER.createValue("test"):

Iteration   1: 1112305.967 ops/ms
Iteration   2: 1114183.052 ops/ms
Iteration   3: 1113783.793 ops/ms
Iteration   4: 1111395.048 ops/ms
Iteration   5: 1112449.325 ops/ms

That's 7200+ times faster. The createValue() methods are the ones for which it's the most inefficient as they should be extremely lightweight as called quite often.

While we are mostly using Jackson in Quarkus itself and recommending Jackson, we have several components using JSON-P and JSON-B in the SmallRye world, for instance SmallRye GraphQL and SmallRye Health. And we have people using JSON-P/JSON-B out there because they want to stay in the MicroProfile or Jakarta world.

I know SmallRye GraphQL is affected by the inefficiencies as I have seen some direct calls to the Json methods in some runtime code (for instance here: https://github.com/smallrye/smallrye-graphql/blob/main/client/implementation/src/main/java/io/smallrye/graphql/client/impl/RequestImpl.java#L75). From what I can see SmallRye Health is not affected as it creates a unique JsonProvider.

Note that I haven't done any benchmarks in SmallRye GraphQL, I have just verified that we were instantiating too many JsonProviders.

As for the SmallRye components, I think we should discuss this subject in the context of Quarkus as the solution we want to put in place for Quarkus might be different from the ones we want to use in the general case (for instance in WildFly).

quarkus-bot[bot] commented 3 weeks ago

/cc @EricWittmann (openapi), @Ladicek (fault-tolerance,smallrye), @MikeEdgar (openapi), @brunobat (tracing), @cescoffier (mutiny,reactive-messaging), @ebullient (metrics), @jmartisk (graphql,health,metrics,smallrye), @jponge (mutiny), @ozangunalp (reactive-messaging), @phillip-kruger (graphql,openapi,smallrye), @radcortez (smallrye,tracing), @sberyozkin (jwt), @xstefank (health)

gsmet commented 3 weeks ago

Hmmm, the bot shouldn't have mentioned so many people, sorry about that. We probably need to tweak this area/smallrye label.

gsmet commented 3 weeks ago

A quick workaround would be to initialize one static JsonProvider somewhere (or at least to have only one for the client and one for the server if we want to support multiple deployments and tweaking the JsonProvider but that means having the means to propagate it wherever it's needed) in the code and ban the use of Json..

Now, no idea if JSON-B uses Json so that might just be a band aid.

jmartisk commented 3 weeks ago

Regarding the WildFly/EAP case and potential multiple providers: I generally gave up on supporting multiple deployments of SmallRye GraphQL in a single WF instance. It generally works, but there are lots of potential problems where the apps are not fully isolated. In the past, I made some efforts to make it work, but it was unwieldy, so I kinda gave up. This is in line with the official support policies of the EAP XP product (see https://access.redhat.com/articles/2026253) where multiple deployments are explicitly unsupported. So I guess we could start with the quick fix of caching the JSON provider and then look into bringing in support for Jackson...

gsmet commented 3 weeks ago

@jmartisk if we are following this path, I wonder if we should make it static in Health too: https://github.com/smallrye/smallrye-health/blob/main/implementation/src/main/java/io/smallrye/health/SmallRyeHealthReporter.java#L125 .

AFAICS SmallRyeHealthReporter is instantiated twice in Quarkus due to our CDI infra so we are initializing two JsonProvider.

Probably worth discussing it with @xstefank and @radcortez .

jmartisk commented 3 weeks ago

Possibly, but given that it's an app-scoped CDI bean, it doesn't seem necessary (the problem would be obtaining the provider on each request), and having static fields in CDI beans is... weird

gsmet commented 3 weeks ago

So I kinda agree but it's still one ServiceLoader call we could avoid. I will discuss it with Martin Kouba hopefully soon.

xstefank commented 3 weeks ago

I was thinking about providing Jackson API in SR Health, but we need to be aligned with JSON-P for spec. But I personally don't see what's so wrong with a static field for this since it's really just utility for marshaling. @mkouba mentioning you so you chime in.

mkouba commented 3 weeks ago

AFAICS SmallRyeHealthReporter is instantiated twice in Quarkus due to our CDI infra so we are initializing two JsonProvider.

It's instantiated twice because it's an @ApplicationScoped bean so one instance is the client proxy which is a subclass of SmallRyeHealthReporter. And the same applies to all those collections and maps, such as SmallRyeHealthReporter#livenessUnis. We could initialize the state in the @PostConstruct callback. For JsonProvider I think it should be ok to use a static final field instead. However, we need to make sure the constant is initialized at the right time, i.e. when the correct provider is available.

radcortez commented 3 weeks ago

Yes, the problem is that in the SR world, we have to support JSON-B / JSON-P because the MP platform requires it.

We could provide our own version of the JSON-B jar, with a different implementation. JBoss did that before, and Apache still does. It shouldn't require changes to the method signatures, so it should pass those. I'm not sure if there is a TCK test that checks if you get a different version of the provider on each call, but I find that unlikely.

gsmet commented 3 weeks ago

So I'm not sure JSON-B itself is affected by this issue. I would expect them to have avoided it given it's well known on their side. We need to be extremely cautious about using Json in our code.

Also, while we know you need to support JSON-P/JSON-B, what we would like to see is an SPI to be able to push Jackson there and avoid having two JSON serialization implementation around in most cases.

Now, what I don't know is if it's feasible or if you have JSON-P/-B publicly exposed. We could probably be OK with JSON-P but it would really be nice if we could avoid having JSON-B in the classpath when using Jackson.

Now is it a priority? Is it too complex. I don't know :).

radcortez commented 3 weeks ago

Also, while we know you need to support JSON-P/JSON-B, what we would like to see is an SPI to be able to push Jackson there and avoid having two JSON serialization implementation around in most cases.

Agree. Unfortunately, that also means that we would need to create yet another JSON Provider entry point so you could delegate to JSON-P or Jackson. It's not difficult, but we have to remember to use that one instead, and most likely, users would still call the original APIs.

The best option, at least for Quarkus, is to provide our own API jar with performance improvements since users need to reference the extension that they want to use directly, and we can easily control which API dependency to add. In traditional Jakarta Apps servers, it's trickier because users add the Jakarta API dependency from the official project.

gsmet commented 3 weeks ago

Unfortunately, that also means that we would need to create yet another JSON Provider entry point so you could delegate to JSON-P or Jackson.

But are the users of SmallRye GraphQL/Health concerned by the JSON provider? That's why I was asking if it was leaking in the APIs?

radcortez commented 3 weeks ago

I don't believe we have leaks in SR itself, but the Jakarta JSON APIs may be included transitively by the MP APIs. We need to check each one individually.

gsmet commented 3 weeks ago

I wouldn't be too worried about having the jars around. There's a difference between having a jar around and initializing the library. We could improve on the latter and see how we can handle the former a bit later.

Anyway, for now, I would focus on making sure we don't instantiate too many providers as it's an easy win. The rest should be prioritized at an higher level probably.

radcortez commented 3 weeks ago

I wouldn't be too worried about having the jars around.

I was thinking about the user calling that API with terrible performance.

gsmet commented 3 weeks ago

Ah yes, that's their responsibility, really. I posted a message in the original issue there explaining how bad it really was with numbers. Because there's bad and there's really bad :).

geoand commented 3 weeks ago

we might want to have an SPI to push Jackson there. I discussed it a few times with @jmartisk already but it wasn't considered top priority

I am personally in favor of this being done sooner rather than later and willing to help make it happen.

jmartisk commented 3 weeks ago

I am personally in favor of this being done sooner rather than later and willing to help make it happen.

That's the spirit! You're certainly welcome to do it :)

t1 commented 3 weeks ago

Maybe this is one of the reasons why YASSON is so slow. It would be wrong to blame JSON-B for the bad performance of YASSON. JSON-B itself doesn't rely on the static methods in jakarta.json.Json; you can pass an instance of a jakarta.json.spi.JsonProvider into your jakarta.json.bind.JsonbBuilder; otherwise it looks it up... once!

I'd like to hint at the QSON project by @patriot1burke and this discussion for this quarkus.io blog post by @mariofusco. They make Jackson even faster by moving the reflection part into the build process... which is really the Quarkus mindset. A big applause for that! QSON seems to not be very active ATM, but it could just require a little push (I love the name!)

I've started to experiment with a similar idea, but as an annotation processor and based on JSON-B APIs. (I haven't published it, yet, because it depends on a SNAPSHOT version of an annotation processing lib I develop in parallel, so it doesn't build OOTB.)

So, we actually always have three parts:

  1. API (i.e. Jackson vs. JSON-B; both seem to be quite feature complete, don't they?)
  2. code scanning (i.e. reflection, annotation processor, or Quarkus build-time meta programming)
  3. (de)serializing

If I'm not mistaken, I think that way back when JSON-B was still young, there even was an initiative to make Jackson support the JSON-B APIs. But I haven't heard anything about that ever since. I have no idea about how much effort that would be; the Jackson code I've yet seen looks.... hmmm... highly optimized.

I would really love it, if application developers could stop worrying about such things. They should just pick an API and always get the best performance.

geoand commented 3 weeks ago

@t1 thanks for the input!

I would like to clarify a couple more reasons why myself (and I assume @gsmet) as much in favor of using Jackson everywhere possible:

t1 commented 3 weeks ago

My previous comment may have sounded too negative; actually I have nothing against Jackson! If I understand the article correctly, Jackson already allows to nicely separate the scanning from the actual (de)serializing. It would be nice, if it would also separate its API from the implementation. Then we could relatively simply add support for the JSON-B APIs for everything MP... or as an alternative, if a Quarkus developer chooses so. Under the hood, it could always be Jackson.

gsmet commented 3 weeks ago

I think we can close this one now.