newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
202 stars 143 forks source link

Add support for Spring Webflux 6.1.0+ #1619

Closed jtduffy closed 8 months ago

jtduffy commented 11 months ago
Execution failed for task ':instrumentation:spring-webflux-6.0.0:verifyPass_org.springframework_spring-webflux_6.1.0'.
> A failure occurred while executing com.newrelic.agent.instrumentation.verify.VerifyWorkAction
   > Verification FAILED. Instrumentation module spring-webflux-6.0.0-1.0.jar SHOULD HAVE applied to org.springframework:spring-webflux:6.1.0 and did not. You may need to adjust the range "org.springframework:spring-webflux:[6.0.0,)".
     Verifier output:
     ...
     WeaveViolation{type=METHOD_BASE_CONCRETE_WEAVE, clazz=org/springframework/web/reactive/function/server/RequestPredicates$PathPatternPredicate, method=test(Lorg/springframework/web/reactive/function/server/ServerRequest;)Z}
     WeaveViolation{type=METHOD_BASE_CONCRETE_WEAVE, clazz=org/springframework/web/reactive/function/server/RequestPredicates$AndRequestPredicate, method=test(Lorg/springframework/web/reactive/function/server/ServerRequest;)Z}
workato-integration[bot] commented 11 months ago

https://new-relic.atlassian.net/browse/NR-185584

jtduffy commented 10 months ago

The weave violations above apply for both 6.1.0 and 6.1.1.

ilya40umov commented 9 months ago

Just ran into this after migrating from Spring Boot 3.1.5 to Spring Boot 3.2.1. Looks like we will need to hold off with the Spring Boot upgrade.

neowin35 commented 9 months ago

any chance this could come earlier than end of march? currently it is rendering our newrelic integration pretty useless :(

tha2015 commented 9 months ago

While it is a feature, instead of a bug, it is still very important to my team because it is blocking us from migrating to newer Spring Boot versions. And staying with old Spring Boot versions is causing troubles because we are using security scanning tools in our development process.

jtduffy commented 8 months ago

We have generated a snapshot jar that contains a new instrumentation module to support webflux 6.1.0+. (From this PR)

We'd love feedback if anyone is able to run some tests in a lower environment.

imdobr commented 8 months ago

Hello all! Thank you for sharing the SNAPSHOT jar. We checked the new version. We observe the following results:

After the first 5 minutes, the Transaction overview shows info "We couldn't find any data during this time window". We did the test 3 times with the same results. It seems to be a blocker issue.

Result after the first 5 minutes:

Typical WARN messages from newrelic_agent.log.

com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/Custom/[SL] /[my_component1]/api/[custom_endpoint1]"
com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/Custom/[my_component2]/api/[custom_endpoint2]"
com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/Custom/[my_component2]/api/[custom_endpoint3]"
com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/SpringController/ReadOperationHandler/handle"
com.newrelic.agent.instrumentation.PointCutClassTransformer WARN: Skipping transformation of class org/springframework/web/servlet/DispatcherServlet because an error occurred: java.lang.RuntimeException: unable to access resource: jakarta/servlet/ServletException.class

Please check the full stack trace for ReadOperationHandler/handle attached. newrelic_agent-20240216.log

jtduffy commented 8 months ago

I've had 2 apps running this morning for around 2 hours each, and I'm not seeing that behavior.

From the logs it looks to be something with the maxSampleSize value being used when trying to construct our BlockingQueue for transaction logs (basically unrelated to the webflux 6.1 module). It's possible there might be some other change in the snapshot that could be causing this. We'll discuss as a team and I'll respond here with more info.

tomhyland commented 8 months ago

Hi there, I have also given the SNAPSHOT Jar a go got the same behaviour with logs full of the same stack trace as @imdobr did.

In our list of transactions the only one now listed is called ReadOperationHandler/handle and also our agent metrics stopped after just under 5 minutes of activity.

This is with Spring Boot 3.2.1 and so containing spring-webflux 6.1.2.

jtduffy commented 8 months ago

Would you happen to have an example controller method in your app that seems to create this scenario? Or a small repro app?

jtduffy commented 8 months ago

Also, to verify if it is in fact the new module, can you add the following to your yml config to disable it?

  class_transformer:
    com.newrelic.instrumentation.spring-webflux-6.1.0:
      enabled: false
tomhyland commented 8 months ago

Some additional information, all of the ReadOperationHandler/handle named transactions look to be related to actuator with the URI: /actuator/health/readiness which is being called by Kubernetes, so are not related directly related to our code. These normally get grouped under NettyDispatcher in our earlier version of the Java agent.

We did not get any transaction output for our REST API that I tested (org.springframework.web.bind.annotation.RestController)

tomhyland commented 8 months ago

Also, to verify if it is in fact the new module, can you add the following to your yml config to disable it?

  class_transformer:
    com.newrelic.instrumentation.spring-webflux-6.1.0:
      enabled: false

With this in place I am still getting issues although a slightly different stack trace.


2024-02-16T15:33:14,596+0000 [1 88] com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/SpringController/ReadOperationHandler/handle"
java.lang.IllegalArgumentException: null
    at java.util.concurrent.LinkedBlockingQueue.<init>(LinkedBlockingQueue.java:256) ~[?:?]
    at com.newrelic.agent.service.logging.LogSenderServiceImpl$TransactionLogs.<init>(LogSenderServiceImpl.java:593) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.service.logging.LogSenderServiceImpl.getTransactionLogs(LogSenderServiceImpl.java:582) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.Transaction.getLogEventData(Transaction.java:623) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.TransactionData.getLogEventData(TransactionData.java:53) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.service.logging.LogSenderServiceImpl$1.dispatcherTransactionFinished(LogSenderServiceImpl.java:87) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.TransactionService.doProcessTransaction(TransactionService.java:150) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.TransactionService.transactionFinished(TransactionService.java:103) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.Transaction.finishTransaction(Transaction.java:1141) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.Transaction.checkFinishTransaction(Transaction.java:2456) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.Transaction.checkFinishTransactionFromToken(Transaction.java:2434) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.Transaction.onRemoval(Transaction.java:2273) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at com.newrelic.agent.TokenImpl.markExpired(TokenImpl.java:149) ~[newrelic.jar:8.10.0-SNAPSHOT]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1583) [?:?]
imdobr commented 8 months ago

Also, to verify if it is in fact the new module, can you add the following to your yml config to disable it?

Done. Result:

Config file:

common:
  error_collector:
    expected_status_codes: '400-499'
    expected_classes:
      - 'org.springframework.web.util.NestedServletException'
  class_transformer:
    com.newrelic.instrumentation.java.completable-future-jdk8u40:
      enabled: false
    com.newrelic.instrumentation.spring-webflux-6.1.0:
      enabled: false

Misc info:

2024-02-16T16:10:04,160+0100 [1 54] com.newrelic.agent.TransactionService WARN: Error recording transaction "WebTransaction/NettyDispatcher/NettyDispatcher"
java.lang.IllegalArgumentException: null
        at java.util.concurrent.LinkedBlockingQueue.<init>(Unknown Source) ~[?:?]
        at com.newrelic.agent.service.logging.LogSenderServiceImpl$TransactionLogs.<init>(LogSenderServiceImpl.java:593) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.service.logging.LogSenderServiceImpl.getTransactionLogs(LogSenderServiceImpl.java:582) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.Transaction.getLogEventData(Transaction.java:623) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.TransactionData.getLogEventData(TransactionData.java:53) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.service.logging.LogSenderServiceImpl$1.dispatcherTransactionFinished(LogSenderServiceImpl.java:87) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.TransactionService.doProcessTransaction(TransactionService.java:150) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.TransactionService.transactionFinished(TransactionService.java:103) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.Transaction.finishTransaction(Transaction.java:1141) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.Transaction.checkFinishTransaction(Transaction.java:2456) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.Transaction.checkFinishTransactionFromToken(Transaction.java:2434) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.Transaction.onRemoval(Transaction.java:2273) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at com.newrelic.agent.TokenImpl.markExpired(TokenImpl.java:149) ~[newrelic.jar:8.10.0-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
        at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]`
jtduffy commented 8 months ago

Thanks, this is really good info. @imdobr I assume you also have the Spring actuator enabled in your app? Are you calling the same URL that Tom mentioned? /actuator/health/readiness

Also, the only actuator related dependency in your build is org.springframework.boot:spring-boot-starter-actuator correct?

tomhyland commented 8 months ago

Yes 'org.springframework.boot:spring-boot-starter-actuator:3.2.1'

imdobr commented 8 months ago

I assume you also have the Spring actuator enabled in your app? Are you calling the same URL that Tom mentioned? /actuator/health/readiness

Yes and yes.

the only actuator related dependency in your build is org.springframework.boot:spring-boot-starter-actuator correct?

In our case, for some reason there are 3 related dependencies: spring-boot-actuator-3.2.2.jar spring-boot-actuator-autoconfigure-3.2.2.jar spring-boot-autoconfigure-3.2.2.jar

jtduffy commented 8 months ago

@tomhyland @imdobr We figured out the issue. It was a log service change unrelated to the webflux 6.1 module. We're going to make a fix and then we'll publish a new snapshot for you to try. I'll post a comment here when the snapshot is available. It might be sometime next week.

Thanks again for taking the time to run the snapshot jar and provide feedback!

jtduffy commented 8 months ago

@tomhyland @imdobr An updated snapshot is available with the problematic change removed.

apoguy commented 8 months ago

I had the same issue, and with this snapshot it got solved. Any ETA for release?

Thanks!

tomhyland commented 8 months ago

Hi, I have rerun my tests with this version of the agent, deploying it to 3 interconnected services and all working fine with this snapshot 👍
I can see database time spent and distributed tracing spanning multiple entities again.

jtduffy commented 8 months ago

We are targeting a release for this week or the next. Thanks everyone for the feedback.

imdobr commented 8 months ago

@imdobr An updated snapshot is available with the problematic change removed.

Yes, we confirm that the issue is also resolved for us. Thank you for the fix and the timeline for the release.

jtduffy commented 8 months ago

Everyone -- I misspoke earlier on the release date. It looks like our next release is targeted for 3/28. My apologies for the mistake.