newrelic / newrelic-java-agent

The New Relic Java agent
Apache License 2.0
202 stars 143 forks source link

Investigate possible null Segment in Ning Instrumentation #1895

Closed jasonjkeller closed 5 months ago

jasonjkeller commented 5 months ago

A memory leak was reported by users of the Ning async HTTP client: https://github.com/newrelic/newrelic-java-agent/issues/783

The changes on the ning-fix branch were reported to resolve the memory leak.

The change was to simply get the statusCode and reasonMessage from the responseStatus and null the reference to the responseStatus earlier, before we ever checked if the segment was null or not.

Given that the changes seem to fix the leak, it seems to indicate that there must be some condition where the onThrowable/onCompleted ning methods are called and the segment is null.

The only place where a value is assigned for the segment is in the AsyncHttpProvider#execute weave class/method. It appears that the only scenarios where the segment would be null is if there is no active transaction or if the URL from the com.ning.http.client.Request has a scheme that is not null and does not equal http or https: https://github.com/newrelic/newrelic-java-agent/blob/edf5bea7e5ba0304a8804920e86cde5381c1b87c/instrumentation/ning-async-http-client-1.6.1/src/main/java/com/ning/http/client/AsyncHttpProvider.java#L35-L44

The Ning async HTTP client instrumentation does not start its own transaction, so the most likely scenario is that the client is being called by uninstrumented code when the memory leak occurs. This would mean that each com.ning.http.client.AsyncHandler instance would have a NewField weaved into it which would assign a reference to a HttpResponseStatus every time onStatusReceived is called but the reference would never be freed since there is no active segment. If this is the only issue then it is effectively solved by the ning-fix changes. It should be easy to repro a scenario where the async http client is called but no transaction is created to see if that manifests a memory leak.

That said, it might be worth investigating if there are other scheme types that should be supported. This seems unlikely, as we typically just support http/s, but perhaps there's a slightly different scheme identifier for http2.

workato-integration[bot] commented 5 months ago

https://new-relic.atlassian.net/browse/NR-270723

jasonjkeller commented 5 months ago

Ran a simple test app that just makes a bunch of external calls using the Ning async HTTP client.

    implementation 'com.ning:async-http-client:1.9.30'
    implementation 'com.newrelic.agent.java:newrelic-api:8.11.0'
package org.example;

import com.newrelic.api.agent.NewRelic;
import com.newrelic.api.agent.Trace;
import com.newrelic.api.agent.Transaction;
import com.ning.http.client.AsyncHttpClient;
import com.ning.http.client.Response;

import java.util.concurrent.Future;

public class Main {
    public static void main(String[] args) {
        while (true) {
            makeNingExternalCall();
        }
    }

//    @Trace(dispatcher = true)
    public static void makeNingExternalCall() {
        try (AsyncHttpClient asyncHttpClient = new AsyncHttpClient()) {
            Future<Response> f = asyncHttpClient.prepareGet("https://example.com/").execute();
            Response r = f.get();
            Transaction transaction = NewRelic.getAgent().getTransaction();
            System.out.println("txn: " + transaction);
        } catch (Exception ignored) {

        }
    }
}

When the external calls were made with no active transaction (using 8.11.0 agent), the old gen heap continuously grew indicating that the memory leak condition was present as hypothesized.

no-txn-leak

When the external calls were made with an active transaction (using 8.11.0 agent), the old gen heap exhibited a saw tooth pattern indicating that the memory was being reclaimed and that there was no memory leak occurring.

txn-no-leak

When the external calls were made with no active transaction (using 8.12.0-SNAPSHOT agent with memory leak fix), the old gen heap exhibited a saw tooth pattern indicating that the memory was being reclaimed and that there was no memory leak occurring.

no-txn-no-leak