Cassandra-3.4 provides pluggable tracing. By adding 3 jar files to the Cassandra classpath and one jvm option, Cassandra's tracing is replaced with Zipkin. It can even identify incoming Zipkin traces and add Cassandra's own internal tracing on to it.
mvn install
cp target/*.jar $CASSANDRA_HOME/lib/
Then start Cassandra with
JVM_OPTS \
="-Dcassandra.custom_tracing_class=com.thelastpickle.cassandra.tracing.ZipkinTracing" \
cassandra
Or edit the jvm.options
.
The default SpanCollector sends the tracing messages via HTTP to http://127.0.0.1:9411/
. This is the default port for the zipkin-java server. The same url is used for the UI.
To continue existing Zipkin traces from application code through the DataStax CQL driver and into the Cassandra cluster.
The Cassandra nodes need to be started also with the cassandra.custom_query_handler_class
jvm option to a query handler that accepts incoming payloads over the CQL protocol:
JVM_OPTS \
="-Dcassandra.custom_tracing_class=com.thelastpickle.cassandra.tracing.ZipkinTracing" \
-Dcassandra.custom_query_handler_class=org.apache.cassandra.cql3.CustomPayloadMirroringQueryHandler"
cassandra
(Or edit the jvm.options
)
Then in the application code where the DataStax CQL driver is used put the Zipkin traceId and spanId into the outgoing payload like
SpanId spanId = clientTracer.startNewSpan(statement.toString());
ByteBuffer traceHeaders = ByteBuffer.wrap(spanId.bytes());
statement.setOutgoingPayload(singletonMap("zipkin", traceHeaders.array()));
clientTracer.setCurrentClientServiceName(serviceName);
clientTracer.setClientSent();
ResultSet result = session.execute(statement);
clientTracer.setClientReceived();
return result;
See this presentation.
See CASSANDRA-10392 for the patch to extend Cassandra's tracing that this project plugs into.
When this tracing is used instead of Cassandra's default tracing, any cqlsh statements run after enabling tracing with
TRACING ON;
are going to time out eventually giving
Unable to fetch query trace: Trace information was not available within …
This is because cqlsh is polling for tracing information in system_traces which isn't any longer being created. Zipkin tracing doesn't support this interaction with cqlsh (it's more of a thing to use with a tracing sampling rate). Improvements in this area are possible though, for example we could use zipkin tracing when the custom payload contains a zipkin traceId and spanId and fall back to normal tracing otherwise (which would work for cqlsh interaction). For the meantime an easy fix around this behaviour in cqlsh is to reduce Session.max_trace_wait down to 1 second.