openzipkin / brave

Java distributed tracing implementation compatible with Zipkin backend services.
Apache License 2.0
2.36k stars 714 forks source link

Propagate trace ID to database queries #881

Open Diagoras opened 5 years ago

Diagoras commented 5 years ago

I initially opened this issue against Spring Sleuth, but after some discussion we decided it belonged over here instead. Let me know if that's wrong, though.

We've found that when troubleshooting slow queries it can sometimes be difficult to definitively link a given query back to the operation that initiated it. This is especially true for DBAs that troubleshoot problems using a database-first perspective, where they hunt down slow queries using the database's profiling tools instead of application logs.

Do you think it makes sense to prepend the trace ID as a comment on queries? That would allow database queries to be tied back to a trace ID, which in turn would link them back to the operation that produced them. For example:

-- X-B3-TraceId: b763b53c5c6ba343
SELECT * FROM foo WHERE baz = true;

Obviously, if there's a better way to associate metadata with a query than a literal comment then I'd advocate going in that direction. Please let me know what you think, and thanks for maintaining this library.

jorgheymans commented 5 years ago

Brave currently relies on https://github.com/p6spy/p6spy to trace database calls (some discussion #824 took place to see if https://github.com/r2dbc/r2dbc-proxy would be a better fit but that's beside the point). I guess such functionality would belong more there, providing a hook for brave to call and set the trace/span id. https://github.com/p6spy/p6spy/tree/master/src/main/java/com/p6spy/engine/wrapper provides a bunch of wrappers, so it seems doable to modify the statement. Have you checked that prepending a comment is portable across databases and will not break any queries ?

Definitely a nice idea !

codefromthecrypt commented 5 years ago

I would probably advise against using X-B3- as that is a bit too specific to a propagation format which might not be in use. If this is using correlation, I'd suggest traceId and spanId like our MDC integration. On that note, I suppose I agree with the implementation suggestion of @jorgheymans (using a database wrapper)

Diagoras commented 5 years ago

Have you checked that prepending a comment is portable across databases and will not break any queries ?

I have not, but that's definitely a good idea! My naive view was that the comment syntax was consistent across databases and couldn't break anything. But you're right, that's something we'd want to make sure of before making this change.

codefromthecrypt commented 5 years ago

one funny thing is that in some areas comments themselves end up as statements (I've seen this in mysql). Which means we have to be careful not to trace the comments :P

At least in the beginning, and possibly forever, I think this should be an opt-in filter based approach so that sites can practice getting it right. It might be small enough to copy/paste until such experience accrues.

jorgheymans commented 5 years ago

Any value in not doing a generic approach like using the comment field, but rather trying to hook into the specific auditing mechanisms of each database type directly ? Just thinking out loud ...

Diagoras commented 5 years ago

@jorgheymans Yeah, that would be awesome! I mentioned in my initial comment that if there was a better way to tie metadata (like a trace ID) to a SQL query then I'm all for going with that approach.

What specific auditing mechanism were you thinking of, here? I'm mostly familiar with Postgres which (AFAIK) doesn't have one, but if others do then that would be a really good integration.

brothertao commented 4 years ago

mysql binlog can log sql comment

codefromthecrypt commented 4 years ago

@brothertao can you give a java example using mysql interceptor? I think this is a neat idea

brothertao commented 3 years ago

@adriancole i use mybatis plugin to interceptor dml sql,add trace id to sql comment u also can intercept sql in jdbc api i think

codefromthecrypt commented 3 years ago

if you have a sample project please link it back. It is possible we can do this, but we'd have to choose what to add. Probably traceId=abcd because this decoupled from propagation. Ex in logs we use the traceId key.

I've found something recent about a system who uses yaml for payload because it permits comments. They apparently use comments for traceIds also. I will try to find it.

In any case this seems a correlation feature and without any special field (JDBC wouldn't know but maybe mysql does? no idea), a practice of opt-in on comments might make sense. Probably as you said a filter as people will likely be careful to not add comments to.. comment statements :D

but stepping back, please post an example of what works for you. I think this is basically special-casing what can already be done with filters.

brothertao commented 3 years ago
   // add traceid to sql comment
    String sql = boundSql.getSql();

    if (mappedStatement.getSqlCommandType() != SqlCommandType.SELECT) {
      sql = binlogTraceService.generateTraceComment(sql);
    }
codefromthecrypt commented 3 years ago

@brothertao what is the literal SQL you would generate.. curious

brothertao commented 3 years ago

like this:

/*__trace_start__{"traceId":"159aad5c8bc089","userId":0}__trace_end__*/ insert into 
 msg_log (`id`,`data`) values (?,?)
codefromthecrypt commented 3 years ago

thanks @brothertao and how do you use this data on the other side? grep? or something different?

brothertao commented 3 years ago

like this:

https://dba.stackexchange.com/questions/101651/show-sql-statements-from-mysql-binlog-gtid

alecl commented 3 years ago

Some database systems create and cache query plans based on the query matching and it's not certain they're sophisticated enough or willing to take the performance hit to do a semantic match skipping comments that are dynamic. Just something to keep in mind.

codefromthecrypt commented 3 years ago

right, if someone did something like this it would have to be opt-in. even our cassandra propagation is opt-in. https://github.com/openzipkin/brave-cassandra/blob/master/cassandra-driver/src/main/java/brave/cassandra/driver/CassandraClientTracing.java#L140

I'm setting this help-wanted as personally won't be doing this feature

brothertao commented 3 years ago

Some database systems create and cache query plans based on the query matching and it's not certain they're sophisticated enough or willing to take the performance hit to do a semantic match skipping comments that are dynamic. Just something to keep in mind.

Just trace update, insert, delete