open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.95k stars 856 forks source link

Cassandra integration creates too many sessions #3092

Open necrolyte2 opened 3 years ago

necrolyte2 commented 3 years ago

Describe the bug

Currently we are unable to enable the Cassandra integration for the OTEL agent due to the following error

{"thread":"ratpack-blocking-3-1","level":"WARN","loggerName":"com.datastax.oss.driver.internal.core.session.DefaultSession","message":"You have too many session instances: 143 active, expected less than 1 (see 'advanced.session-leak.threshold' in the configuration)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","instant":{"epochSecond":1622035734,"nanoOfSecond":246582000},"contextMap":{"span_id":"f6a9043d82c4ca6a","trace_flags":"01","trace_id":"e9a3c7eb1ecf6c713cdc5cfe13ffc37f"},"threadId":400,"threadPriority":5}

This error will basically just keep being emitted(incrementing the # of sessions each time)

Setting OTEL_INSTRUMENTATION_CASSANDRA_ENABLED=false prevents the session issue, but may be causing a loss in context propegation

See this for more information on that propegation loss

Steps to reproduce

OTEL_INSTRUMENTATION_CASSANDRA_ENABLED=true

What did you expect to see? cassandra integration should emit spans correctly and not open so many sessions

What did you see instead?

{"thread":"ratpack-blocking-3-1","level":"WARN","loggerName":"com.datastax.oss.driver.internal.core.session.DefaultSession","message":"You have too many session instances: 143 active, expected less than 1 (see 'advanced.session-leak.threshold' in the configuration)","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","instant":{"epochSecond":1622035734,"nanoOfSecond":246582000},"contextMap":{"span_id":"f6a9043d82c4ca6a","trace_flags":"01","trace_id":"e9a3c7eb1ecf6c713cdc5cfe13ffc37f"},"threadId":400,"threadPriority":5}

What version are you using? Doesn't matter which version of the agent, the issue is the same I have used the latest release(v1.2.0) as well as 1.1.0, 1.0.1 and 0.17.0

Environment Compiler: AdoptOpenJDK 11.0.6

Additional context

trask commented 3 years ago

hey @necrolyte2! can you tell if the javaagent is causing more sessions than normal to be created, or if it is causing sessions to be retained (leaked) when they would normally be GC'd and closed on their own?

necrolyte2 commented 3 years ago

I'm not really familiar with Cassandra or the datastax libraries. Any tips on how I would tell the difference?

trask commented 3 years ago

any chance you can create a repro that we can analyze?

necrolyte2 commented 3 years ago

I can try. Some proprietary code that might also be the cause so I'll try to build a repo that enables cassandra without that code and see if it still happens.

necrolyte2 commented 3 years ago

Based on #3358, I'm going to guess something with the codebase we have is causing the session issue. Since the example repo for that bug report is a pretty close approximation to our private code base(minus some internal libraries which may be causing the issue) but doesn't have any issues.

If you want to close this issue for now that is fine. It will likely be quite a bit of time before I am able to dig in and reproduce the issue in a public repo and for now we should be able to just turn off the Cassandra integration if needed.

Tyghe commented 3 years ago

I believe this is the error that is causing the issue for us. The code base essentially has a wrapper around the CqlSession object that does promises instead of raw CqlSessions for you.

java.lang.ClassCastException: class io.opentelemetry.javaagent.instrumentation.cassandra.v4_0.TracingCqlSession cannot be cast to class com.tgt.oss.cassandra.ratpack.api.PromiseSession (io.opentelemetry.javaagent.instrumentation.cassandra.v4_0.TracingCqlSession and com.tgt.oss.cassandra.ratpack.api.PromiseSession are in unnamed module of loader 'app')\n  while locating com.tgt.oss.cassandra.ratpack.internal.provider.PromiseSessionProvider\n  at com.tgt.oss.cassandra.ratpack.CassandraModule.configure(CassandraModule.java:16)

Here is the header of the PromiseSession class that the code has

public interface PromiseSession extends CqlSession {

And the PromiseSession is being injected as

bind(PromiseSession.class).toProvider(PromiseSessionProvider.class).in(Scopes.SINGLETON);
necrolyte2 commented 3 years ago

I think I have gotten it down to the bare minimal and have it reproduced finally. Somehow the actual app we have starts up with the same error produced in the code below and then when a CQL session is initiated it generates all those errors like I posted originally.

The example app doesn't even start up, but does generate the same error at startup time

REPRO: https://github.com/necrolyte2/ratpack-kotlin-otel/releases/tag/5.0.0