newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
199 stars 143 forks source link

Cats effect integration blocks io-compute execution pool. #1424

Closed slovvik closed 2 months ago

slovvik commented 1 year ago

Description

When migrating to cats-effect 3 we notice a lot of logs that says

  at app//com.newrelic.cats3.api.Util$.$anonfun$cleanupTxnAndTokenRefCount$1(Util.scala:72)
  at app//com.newrelic.cats3.api.Util$$$Lambda$2374/0x00000008019ffcc8.apply$mcV$sp(Unknown Source)
  at app//scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
  at app//cats.effect.IOFiber.runLoop(IOFiber.scala:396)
  at app//cats.effect.IOFiber.asyncContinueSuccessfulR(IOFiber.scala:1366)
  at app//cats.effect.IOFiber.run(IOFiber.scala:113)
  at app//cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:632)

or

[WARNING] A Cats Effect worker thread was detected to be in a blocked state (WAITING)
  at java.base@17.0.3/jdk.internal.misc.Unsafe.unpark(Native Method)
  at java.base@17.0.3/java.util.concurrent.locks.LockSupport.unpark(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.locks.AbstractQueuedSynchronizer.signalNext(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.locks.AbstractQueuedSynchronizer.release(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.locks.ReentrantLock.unlock(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.LinkedBlockingQueue.offer(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
  at java.base@17.0.3/java.util.concurrent.Executors$DelegatedExecutorService.submit(Unknown Source)
  at com.newrelic.agent.ExpirationService.expireToken(ExpirationService.java:51)
  at com.newrelic.agent.TimedTokenSet$1.onRemoval(TimedTokenSet.java:70)
  at com.newrelic.agent.TimedTokenSet$1.onRemoval(TimedTokenSet.java:43)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$notifyRemoval$1(BoundedLocalCache.java:327)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$3339/0x0000000801b2c990.run(Unknown Source)
  at com.newrelic.agent.TimedTokenSet$$Lambda$3095/0x000000080199f528.execute(Unknown Source)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.BoundedLocalCache.notifyRemoval(BoundedLocalCache.java:333)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.BoundedLocalCache.removeNode(BoundedLocalCache.java:1882)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.BoundedLocalCache.clear(BoundedLocalCache.java:1819)
  at com.newrelic.agent.deps.com.github.benmanes.caffeine.cache.LocalManualCache.invalidateAll(LocalManualCache.java:150)
  at com.newrelic.agent.TimedTokenSet.removeAll(TimedTokenSet.java:105)
  at com.newrelic.agent.Transaction.expireAllTokensForCurrentTransaction(Transaction.java:2236)
  at com.newrelic.agent.TransactionApiImpl.expireAllTokens(TransactionApiImpl.java:470)
  at app//com.newrelic.cats3.api.Util$.$anonfun$cleanupTxnAndTokenRefCount$1(Util.scala:76)
  at app//com.newrelic.cats3.api.Util$$$Lambda$3111/0x0000000801b324f0.apply$mcV$sp(Unknown Source)
  at app//scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
  at app//cats.effect.IOFiber.runLoop(IOFiber.scala:396)
  at app//cats.effect.IOFiber.asyncContinueSuccessfulR(IOFiber.scala:1366)
  at app//cats.effect.IOFiber.run(IOFiber.scala:113)
  at app//cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:632)

Expected Behavior

Cats-effect integration should not block io-compute execution pool

Workaround

Turn off cats effect instrumentation

  class_transformer:
    com.newrelic.instrumentation.cats-effect-3.5.1:
      enabled: false
    com.newrelic.instrumentation.cats-effect-3:
      enabled: false

Your Environment

Cats-effect 3.5.1 Newrelic: 7.11.1

workato-integration[bot] commented 1 year ago

https://new-relic.atlassian.net/browse/NR-147037

kford-newrelic commented 8 months ago

@slovvik thanks for alerting us to this issue. Other than the additional warning messages in the logs, have you noticed any other impact?

obenkenobi commented 5 months ago

This problem with Cats Effect 3 is caused by this issue

kanderson250 commented 3 months ago

@slovvik We're looking into this issue currently.

Are you able to reproduce the error? If not, any additional info you're able to provide would be very helpful for us.

kanderson250 commented 2 months ago

We looked extensively at our use of the caffeine cache for related issue #1711 . We determined that we can't mitigate the blocking behavior in situations that trigger an extremely high number of writes to the cache. Like in 1711, changing the behavior here would require a lower-level overhaul of the agent.

Closing and marking as won't do.