openzipkin / zipkin-support

repository for support questions raised as issues
4 stars 2 forks source link

Lens ignores serviceName when span is a client and not a leaf #2

Closed rmannibucau closed 4 years ago

rmannibucau commented 4 years ago

Hi,

It seems here https://github.com/openzipkin/zipkin/blame/master/zipkin-lens/src/zipkin/span-row.js#L349 that service name will fallback on "unknown" in the UI if there are multiple span of kind CLIENT before the last span. It can happen tracing databases so it would be good to just use the remote service name anyway?

Thanks, Romain

codefromthecrypt commented 4 years ago

can you elaborate on what's causing the multiple client before your last span? that sounds like an instrumentation bug. maybe paste your trace and what's causing it?

rmannibucau commented 4 years ago

Well it is not a bug, I instrument all messages linked to a connection. The two direct use cases I have are:

  1. Jdbc getConnection > createStatement > execute, all are client
  2. Websocket createClient > onOpen

So clients are not always leaves.

codefromthecrypt commented 4 years ago

@rmannibucau sorry I wasnt' clear.

client is a leave within a local root. when there's a callback, it should be put in the invocation context (the caller of the client, not the client itself).

Maybe you can share a trace where we can better understand what you are getting at.

codefromthecrypt commented 4 years ago

PS also sorry for not replying earlier, my notifications were messed up and now fixed.

rmannibucau commented 4 years ago

@adriancole no worries. Well client is a leaf....or not ;). It is one modelisation but it is not generic enough. Concretely, each time you have a long time connection you can desire to either modelize it regarding the caller or the parent span. It is the case for databases operations done in a single transaction, you want them attached to the transaction/connection but all are clients. Same for websockets, you dont always want to attach them to the caller but to the parent connection in some applications using some pattern close to the one expressed for databases.

So at the end this restriction is a limiting choice for me and technically there is no reason. Client or server is a metada like httpstatus can be in some span, no reason to assume anything for its representation. I'd let the dev/design/architect do that choice (adapted to the app).

codefromthecrypt commented 4 years ago

I wouldn't map connection as a client or share hierarchy with users of the connection. connection is not request/trace scoped, in other words.

https://github.com/openzipkin/brave/blob/4b6576f1a285da903ed13107dd4f5b1599042cd9/brave/src/test/java/brave/features/advanced/CustomScopedClockTracingTest.java

best results will come by treating the request as king

codefromthecrypt commented 4 years ago

it other words, while possible to interpret things otherwise, we (and so many others) treat client as not a long lived reusable connection, rather a request/response thing. You may disagree, but to converge not just us but other systems, vendors etc to the world view that client is larger scope than a request will require a lot of work. If you are up to doing that work, you can take it on, but modeling differently is a lower hanging fruit.

I know you probably don't like this answer, but it is an answer (free on the weekend etc.) best luck!

rmannibucau commented 4 years ago

@adriancole it is request scope, take any jdbc driver, transaction is bound to a thread and the operations are bound to a tx and all are clients so it should be respected. Currently it is not possible to represent properly the hierarchy of the operations which is a big drawback.

codefromthecrypt commented 4 years ago

@rmannibucau if you would like to progress this, possibly you can paste the json of a trace, ideally also with a screenshot annotated with what you wish to see.

Generally at this point in converstation "show not tell" is the best way forward.

If you are able to get concrete data here, I'd love to analyze further, just understand we've been working on this for 5 years and have seen many things especially database things. Changing something can break something else. Doing so blindly is irresponsible to other people. If you can share more concretely what you are doing even better.. how you are doing it (the instrumentation) we can probably figure something out.

After that, it is possible we can have a concrete suggestion for https://github.com/openzipkin/zipkin/issues/2274 but until we get concrete, there's no more action to take

rmannibucau commented 4 years ago

@adriancole ok, i'll try to capture back something but note that it can NOT break anything since it requires a change in the capture so risk is 0 for the 5 years users ;).

Here is a trace:

[
   {
      "annotations":[
         {
            "timestamp":1587887695477000,
            "value":"cr"
         },
         {
            "timestamp":1587887695487000,
            "value":"cs"
         }
      ],
      "duration":10000,
      "id":"2dc4ef28cf11878b",
      "kind":"CLIENT",
      "name":"test:get_connection",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "database":"test",
         "component":"app",
         "source":"datasource",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695477000,
      "traceId":"1699c0da0fec4db9"
   },
   {
      "annotations":[
         {
            "timestamp":1587887695487000,
            "value":"cr"
         },
         {
            "timestamp":1587887695508000,
            "value":"cs"
         }
      ],
      "duration":21000,
      "id":"dfbd9e14c740f5d9",
      "kind":"CLIENT",
      "name":"test:create_prepared_statement",
      "parentId":"2dc4ef28cf11878b",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "sql.query":"select 1",
         "database":"test",
         "component":"app",
         "source":"connection",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695487000,
      "traceId":"1699c0da0fec4db9"
   },
   {
      "annotations":[
         {
            "timestamp":1587887695509000,
            "value":"cr"
         },
         {
            "timestamp":1587887695513000,
            "value":"cs"
         }
      ],
      "duration":4000,
      "id":"bcbe52c3fad320cd",
      "kind":"CLIENT",
      "name":"test:prepared_statement:",
      "parentId":"dfbd9e14c740f5d9",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "sql.query":"select 1",
         "database":"test",
         "component":"app",
         "source":"prepared_statement",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695509000,
      "traceId":"1699c0da0fec4db9"
   },
   {
      "annotations":[
         {
            "timestamp":1587887695513000,
            "value":"cr"
         },
         {
            "timestamp":1587887695514000,
            "value":"cs"
         }
      ],
      "duration":1000,
      "id":"d4f4a1ffc1f58519",
      "kind":"CLIENT",
      "name":"test:prepared_statement:",
      "parentId":"dfbd9e14c740f5d9",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "sql.query":"select 1",
         "database":"test",
         "component":"app",
         "source":"prepared_statement",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695513000,
      "traceId":"1699c0da0fec4db9"
   },
   {
      "annotations":[
         {
            "timestamp":1587887695514000,
            "value":"cr"
         },
         {
            "timestamp":1587887695514000,
            "value":"cs"
         }
      ],
      "duration":0,
      "id":"d650b3745bbaf5cb",
      "kind":"CLIENT",
      "name":"test:create_prepared_statement",
      "parentId":"2dc4ef28cf11878b",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "sql.query":"select 1",
         "database":"test",
         "component":"app",
         "source":"connection",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695514000,
      "traceId":"1699c0da0fec4db9"
   },
   {
      "annotations":[
         {
            "timestamp":1587887695514000,
            "value":"cr"
         },
         {
            "timestamp":1587887695514000,
            "value":"cs"
         }
      ],
      "duration":0,
      "id":"249bd016419267cd",
      "kind":"CLIENT",
      "name":"test:prepared_statement:",
      "parentId":"d650b3745bbaf5cb",
      "remoteEndpoint":{
         "ipv4":"-",
         "port":-1,
         "serviceName":"app"
      },
      "tags":{
         "sql.query":"select 1",
         "database":"test",
         "component":"app",
         "source":"prepared_statement",
         "span.kind":"CLIENT"
      },
      "timestamp":1587887695514000,
      "traceId":"1699c0da0fec4db9"
   }
]

Corresponding code is:

try (final Connection connection = dataSource.getConnection()) {
    try (final PreparedStatement ps = connection.prepareStatement("select 1")) {
        ps.execute();
        ps.execute();
    }
    try (final PreparedStatement ps = connection.prepareStatement("select 1")) {
        ps.execute();
    }
} catch (final SQLException e) {
    fail(e);
}

The JDBC connection usage only makes sense in a request (more exactly thread since numerous operations are background operations) context so here the expectation is to see the connection span as a parent of prepareStatement spans which are parents of execute spans. However, all these spans are client of the database. The side note being: the connection can be pooled and the pool is not expected to be the actual representation since it is accross transactions but transactions are expected to be represented since it is exactly what tracing is about.