newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
192 stars 140 forks source link

Investigate increase in response time and CPU being attributed to external calls using Jersey starting with the 8.10.0 agent release #1838

Closed jasonjkeller closed 1 month ago

jasonjkeller commented 1 month ago

A customer reported seeing a massive spike on response time and CPU across a number of their services after upgrading the Java agent from 8.9.1 to 8.10.0.

We observed that the large increase in response time was being attributed to Web External time and seemed to be associated with external spans like External/my.awesome.host.com/Jersey-Client/GET.

The Jersey client utilizes httpurlconnection under the hood and it seems as though disabling the httpurlconnection instrumentation resolved the issue.

NEW_RELIC_CLASS_TRANSFORMER_COM_NEWRELIC_INSTRUMENTATION_HTTPURLCONNECTION_ENABLED=false

The extent of the changes to the httpurlconnection instrumentation between 8.9.1 and 8.10.0 were extremely minor and don't see like they should cause the increased overhead, yet it seems as though they must be the culprit and tests with the changes reverted back this up. https://github.com/newrelic/newrelic-java-agent/compare/main...revert-httpurlconnection-change-8-10-0

These changes were reverted and will be in the 8.11.0 agent release. We still want to investigate why the changes had such an effect though so that we are more informed when making similar changes in the future.

The associated Jira has test results and a repro app linked on it.

workato-integration[bot] commented 1 month ago

https://new-relic.atlassian.net/browse/NR-255558

jasonjkeller commented 1 month ago

This thread-stack.log seems to illustrate the problem with the httpurlconnection instrumentation caused by the changes in the 8.10.0 release (reverted in 8.11.0). It looks like the call to HttpURLConnection.getHeaderFields that was added when we wrap the inbound connection object triggers a call to HttpURLConnection.getInputStream which triggers an infinite loop through our instrumentation.

  sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1593)
  sun.net.www.protocol.http.HttpURLConnection.getHeaderFields(HttpURLConnection.java:3244)
  com.nr.agent.instrumentation.httpurlconnection.InboundWrapper.<init>(InboundWrapper.java:22)
  com.nr.agent.instrumentation.httpurlconnection.MetricState.reportExternalCall(MetricState.java:125)
  com.nr.agent.instrumentation.httpurlconnection.MetricState.inboundPostamble(MetricState.java:107)