Closed jasonjkeller closed 2 days ago
I believe this project uses a caller runs executor on its cache, whereas by default it uses ForkJoinPool. I think this was due to a bug in some jdk versions that broke the executor if on a single cpu docker container.
In some cases Caffeine has thresholds on its work to amortize the cost across threads. While we could do that more liberally, it seems odd to have such a high count that it’s a bottleneck. I’d expect either that should be fixed or a profiler should confirm the actual problem.
Yes, we added that workaround due to a bug in Java 17 that when running on a machine with 1 CPU, the ForkJoinPool would have 0 threads. We probably should revert that, because as it currently stands, the cleanup happens in a thread that is processing a request, making that request take longer than it should.
But I don't think that reverting to the ForkJoinPool would solve the problem in this case. Because there is a lock during the cleanup that seems to prevent other threads from reading/writing.
We are still investigating.
While the lock won't block reads, it will eventually block writers. A write buffer is used to try to absorb a burst of writes to avoid serializing those threads and will acquire the eviction lock to assist if the write buffer is full. You should see this error in the logs if an eviction appears stuck,
The cache is experiencing excessive wait times for acquiring the eviction lock. This may indicate that a long-running computation has halted eviction when trying to remove the victim entry. Consider using AsyncCache to decouple the computation from the map operation.
The stacktrace is odd because the ConcurrentHashMap.get
is very cheap. That loop dequeues from a ReferenceQueue
, obtains the entry, and tries to evict it if present by performing a compute-based removal. If there are millions of pending weak references to evict then I suppose that stack trace might show up, but even then that is the fast part. Usually the compute is slow because it collided with an expensive load in the same hash bin. In those cases it is encouraged to have the value loader compute outside of the map's lock, e.g. by storing a memoizing supplier or future value instead (faq might help).
Original GTSE: https://new-relic.atlassian.net/browse/NR-219675
We repro’ed the issue with an app which added hundreds of millions of entries into the cache.
We looked extensively at tweaking various aspects of our cache. This included using an async cache as suggested in the caffeine faq, reverting the executor to the default, bumping the initial capacity to mitigate resizing issues, and trying manual instead of atomic writes. None of these tweaks made a significant impact to the performance.
Using an unbounded cache is not possible because we require weak key references and an eviction policy to prevent massive memory overhead (we did try this - and as expected, the increased memory use shot the system completely).
Ultimately, we determined that the blocking behavior is unavoidable by the agent in situations with an extremely high number of writes into the cache. Changing this behavior would entail a lower-level overhaul of the agent. Marking as a won’t fix.
If there are any details to feedback up to the Caffeine project, either public or private, please feel welcome to share.
There was a hint in the thread that millions of objects might be weakly evictable at once and that was causing the blocking behavior as it was cleaned up in a single pass. If that is the case and it needs to be amortized over multiple cache calls, like we do in other cases, then that could be added to the library. Presently I haven't heard that as a problem, and reverting to the default executor implies its not the case since the eviction would be async. Thus, I'm not familiar enough with the performance issue observed to provide any further support. If it does become something worth revisiting, please feel welcome to reach out.
About
Investigate if a Caffeine
BoundedLocalCache
can be made non-blocking or whether using anUnboundedLocalCache
has non-blocking behavior, so that theExtensionHolderImpl
doesn't potentially lead to thread contention under heavy usage.https://github.com/ben-manes/caffeine/blob/2bcb7ca5260cfc266ed81bc31939cf3cadaadc67/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Caffeine.java#L1016-L1024
Problem
When instrumentation modules add an
@NewField
to a class the new fields don't actually get woven into the class but rather are stored in a cache that is keyed by an instance of the cache. These new fields are obtained by callingExtensionHolderImpl.getExtension
which uses ancom.github.benmanes.caffeine.cache.BoundedLocalCache
under the hood. PeriodicallyBoundedLocalCache.performCleanUp
is called to clean up cached entries for key references that have gone out of scope. In some cases where a large number of object instances are being created and cached this clean up action can cause blocking.The stack trace below shows one such scenario related to a large number of
PreparedStatement
's being instrumented, each of which generates apreparedSql
new field.https://github.com/newrelic/newrelic-java-agent/blob/0eeef2826a7acd0144b19be22732cab430d43de7/instrumentation/jdbc-generic/src/main/java/java/sql/PreparedStatement_Weaved.java#L28-L29
Related