Closed jmreardon-yoppworks closed 2 months ago
Looking a bit further, excluding specifically slick/util/AsyncExecutor.*
is enough. This class was rewritten in Slick 3.5.0
@jmreardon-yoppworks Thanks for the report. To confirm- were you running successfully with agent 8.13 and Slick 3.3.x, and it was the upgrade to slick 3.5.0 that broke things for you?
@kanderson250 , that's correct.
@jmreardon-yoppworks okay, thank you.
To help us repro and debug this, it would be great to get a little more information about the slick queries that are problematic. Are the stalling queries consistent, ie are they stalling every time? Or are they running sometimes... and then getting hung up? Is the stall affecting some queries, and not others?
If you're able to provide a pared-down code sample, that would be very helpful.
@kanderson250 I tried testing New Relic + some Slick demo queries and am not seeing any failures. The actually queries are being made by the Akka Persistence JDBC library, so it's a bit beneath the level of code I actually control.
@jmreardon-yoppworks Okay, thanks for the update. I also wasn't seeing failures with straight slick queries. Will continue to investigate.
@kanderson250 , I was able to get a reproducer working! The issue happens when there are more calls being made than database connections: https://github.com/jmreardon-yoppworks/new-relic-bug-1996
I suspect this is related to the PrioritizedRunnable
added in 3.5.0 (https://github.com/slick/slick/blob/main/slick/src/main/scala/slick/util/AsyncExecutor.scala#L29), conflicting with the New Relic Runnable wrapper. The implementation does a type check which I think the NR wrapper is defeating becasue it wraps that runnable in a NewRelicRunnable (https://github.com/slick/slick/blob/main/slick/src/main/scala/slick/util/AsyncExecutor.scala#L280)
@jmreardon-yoppworks Awesome!! Thank you for taking the time to continue sleuthing and figure out this repro! Will take a look and let you know how it goes.
We found the bug - leaving some comments here in case this resurfaces at a later time.
The cause of the stall was our NewRelicAsyncExecutor
wrapper. Internally (as of 3.5.0), Slick always implements AsyncExecutor
as AsyncExecutorImpl
. Our wrapper was effectively upcasting AsyncExecutorImpl
back to AsyncExecutor
, which led to numerous concurrency issues.
In more detail, this problem manifested because of ManagedArrayBlockingQueue
. The queue relies on the AsyncExecutorImpl
implementation of PrioritizedRunnable
to correctly increment and decrement its number of in-progress tasks. Our NewRelicAsyncExecutor
wrapper ignored this implementation, instead falling through to the default PrioritizedRunnables
that incremented the queue's in-progress counter without decrementing the counter (though the runnables themselves did complete). Eventually, the in-progress counter exceeded the number of max connections and locked the queue into a paused state.
The previously mentioned NewRelicRunnable
wrapper was also a problem (just not, it turns out, the biggest problem).
@kanderson250 , any idea when a fix for these issues will land?
@jmreardon-yoppworks Yep, we have a fix developed. We weren't able to get it in to our next release 8.14.0 (slated for this week) so that means it will be going into 8.15.0 several weeks from now.
If you're interested, here is a snapshot build with the fix (scroll to Artifacts at the bottom of the Summary page for the custom-newrelic-jar
download link): https://github.com/newrelic/newrelic-java-agent/actions/runs/10494664569
You're welcome to try out this snapshot in a test env and see if it works for you.
@kanderson250 snapshot jar works in my test env! Will keep an eye out for when the fix lands. Thanks!
Description
I am trying to upgrade a service to Slick 3.5.0. When I run this service with the updated Slick & New Relic Java agent 8.13.0, some Slick queries just…give no response. I believe the instrumentation needs an update.
Expected Behavior
Previously, the service depended on Slick 3.3.3 and worked with the agent enabled.
Troubleshooting or NR Diag results
Steps to Reproduce
So far I have only observed the issue in a large Akka service. Adding
to my configuration results in the service behaving as expected.
Your Environment
JVM 11, Akka 2.9.3, Slick 3.5.0, MySQL or H2 database.