Avoid repeated resolving of singleton beans through `@Lazy` proxy

dreis2211 commented 3 weeks ago

Hi 👋 ,

we have discovered that one of our applications spends a reasonable amount of CPU cycles (10-20%) inDefaultListableBeanFactory.doResolveDependency during runtime. We have found out that this is caused by repeated resolving of beans through e.g. ObjectProvider.getObject (or in fact @Lazy annotated fields suffer from the same problem).

I've spent a few minutes creating a reproducer example here: https://github.com/dreis2211/objectprovider-example But essentially it's this:

@RestController
public class TestController {

    private final ObjectProvider<List<BeanInterface>> objectProvider;

    private List<BeanInterface> beansCache;

    public TestController(ObjectProvider<List<BeanInterface>> objectProvider) {
        this.objectProvider = objectProvider;
    }

    @GetMapping("/beans")
    public String beans() {
        List<BeanInterface> beans = objectProvider.getIfAvailable();
        return "Hello you beautiful " + beans.size() + " beans";
    }

    @GetMapping("/cached-beans")
    public String cachedBeans() {
        List<BeanInterface> beans = getCachedBeans();
        return "Hello you beautiful " + beans.size() + " beans";
    }

    private List<BeanInterface> getCachedBeans() {
        if (beansCache == null) {
            beansCache = objectProvider.getIfAvailable();
        }
        return beansCache;
    }

}

A little test wrk benchmark shows the following results:

wrk -t12 -c400 -d30s http://localhost:8080/beans

Running 30s test @ http://localhost:8080/beans
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    58.57ms   64.48ms 532.24ms   80.97%
    Req/Sec   372.77    159.75     1.04k    65.66%
  133853 requests in 30.10s, 18.28MB read
  Socket errors: connect 157, read 150, write 7, timeout 0
Requests/sec:   4447.18
Transfer/sec:    621.79KB

Caching those beans e.g. in a class local field (also provided in the example already) yields the following results:

wrk -t12 -c400 -d30s http://localhost:8080/cached-beans
Running 30s test @ http://localhost:8080/cached-beans
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.78ms    1.26ms  69.73ms   72.44%
    Req/Sec     4.75k     1.40k    9.92k    64.14%
  1638000 requests in 30.11s, 223.68MB read
  Socket errors: connect 157, read 153, write 0, timeout 0
Requests/sec:  54405.84
Transfer/sec:      7.43MB

As you can see the latter is considerably better in terms of throughput.

As the application at hand is making use of Lazy or ObjectProvider in many places, a colleague of mine spent some time on writing a CustomAutowireConfigurer that avoids the repeated resolving for us so we don't have to clutter our code with class local caches. That however feels overkill to me. Is there any specific reason why the result of deferred bean resolving is not cached by default? I have the feeling the current behaviour is a little counter-intuitive and might not be known to everybody. A quick look into the docs also doesn't make a note about the runtime performance characteristics of such approaches.

In case there is nothing to be done about the performance in general, I'd at least vote for an additional hint in the documentation about this.

Thank you :)

Cheers, Christoph

dreis2211 commented 3 weeks ago

I should mention the wish from my colleague to have a @CachedLazy annotation for those cases.

Maybe we could extend @Lazy with a boolean cached property. Or a similar functionality through the ObjectProvider API in case the default behaviour is not changed. Overall we didn't expect beans to be repeatedly resolved in the first place, but if there is no way of changing the default, this might be a path forward.

I haven't really looked into this deeply as I don't have time to provide PRs these days. But I wanted to at least forward this idea :)

jhoeller commented 3 weeks ago

Generally speaking, the target bean can be of any scope, hence the fresh retrieval. However, it certainly is very common that the target is a lazily resolved singleton bean, so we should be able to specifically add a local cache for the getObject() result in that common scenario (guarded by a BeanFactory.isSingleton check or the like).

dreis2211 commented 3 weeks ago

I don't think that would cover the List scenario above, wouldn't it @jhoeller ?

jhoeller commented 3 weeks ago

Good point, a collection/array entirely consisting of singleton beans qualifies as well.

jhoeller commented 3 weeks ago

FWIW a related mechanism is in place within AutowiredAnnotationBeanPostProcessor where we turn a single bean match into a shortcut descriptor for a direct bean lookup by name, applying when the containing bean is being instantiated again. That's the minimum that we can certainly do within an injected ObjectProvider as well.

There is an additional concern here with the mutability of collections and arrays. Strictly speaking we'd have to clone the collection/array of singleton beans before we return it since it could have been mutated after a previous retrieval. Is that overhead that you would be concerned about? Should we rather trust the local ObjectProvider usage code that it either does not mutate at all (the common case) or that it won't get surprised by its own collection mutations from a previous lookup attempt? The effect would fortunately be exclusively local since ObjectProvider instances are specifically created for each injection point. Nevertheless, it's a potential case for subtle regressions... which we might be able to get away with in 6.2.

That said, the idiomatic way of dealing with multiple beans in the ObjectProvider case is actually to iterate/stream over an ObjectProvider<MyBean> handle, avoiding a nested List declaration to begin with. This won't be faster on re-retrieval either though, we'd have to implement some specific caching for ObjectProvider.stream() there. It's just less involved in terms of mutability after caching in comparison to a direct collection or array declaration.

dreis2211 commented 3 weeks ago

I'm not sure if you want input from me here or if you're just thinking out loud? :)

If the cloning of the collection is something I would be concerned about is nothing I can answer. That would need measuring. If the additional allocations would drive up GC activity this might outweigh the other CPU gains. Hard to imagine (there are other allocations like constructing ResolvableType instances that we would save), but this is something worth measuring.

jhoeller commented 3 weeks ago

That was a bit of both, I guess :-)

The effect would be similar to the above code doing return new ArrayList(beansCache) in the getCachedBeans() implementation. I'd actually prefer to avoid that since my current cacheability algorithm in DependencyObjectProvider does not need to check for collections or arrays at all, it rather just checks autowiredBeanNames and caches the doResolveDependency result as-is (independent from a single bean / collection / array distinction). It would nevertheless be interesting to know whether such copying has much noticeable impact in your hot code path - or whether this gets largely optimized away by HotSpot anyway.

dreis2211 commented 3 weeks ago

To the second (idiomatic) part: If that's the preferred way of doings things, I would be fine with optimizing retrieval of singleton beans through stream/iterate. So if that solves the mutability problem, go ahead. :-)

As part of this ticket I would see extending the documentation with notes about the performance characteristics or pros and cons and what the preferred way should be.

I'm not sure if your experiments involve the @Lazy annotation or if that goes through the same code-paths eventually? The annotation suffers from the same characteristics.

jhoeller commented 3 weeks ago

With @Lazy, it's actually easier in terms of semantics since there is a single proxy object there anyway. We can quite easily cache the target object within the TargetSource there, I'm less concerned about mutability in such a scenario.

jhoeller commented 1 week ago

I'm reducing this ticket to caching for @Lazy proxies, leaving ObjectProvider as it is for the time being. ObjectProvider is arguably an alternative API for dynamically accessing the BeanFactory, useful for many purposes, whereas @Lazy has clearer semantics in terms of expected behavior.

That said, I'm also tightening @Lazy to always expose an unmodifiable collection as a target, rejecting any kind of modification calls (which would traditionally have executed on a subsequently lost target instance anyway since we re-resolved the target collection on every call).

dreis2211 commented 1 week ago

Sounds fair - could we nonetheless make a note in the docs about the performance characteristics of the ObjectProvider then?

spring-projects / spring-framework

Avoid repeated resolving of singleton beans through `@Lazy` proxy #33841