jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.36k stars 2.43k forks source link

Support optional query param "spanKind" when get operations for given service #1920

Open guo0693 opened 4 years ago

guo0693 commented 4 years ago

Requirement - what kind of business use case are you trying to solve?

We want to filter the operations of a service by span kind, those filtered operations can be used for features e.g. get dependencies graph for service:operation pair from server side (spanKind == "server")

Problem - what in Jaeger blocks you from solving the requirement?

Currently we only save operations indexed by service name and we can only query operations by service name. We will need to add extra info: spanKind when saving operations for a service.

Proposal - what do you suggest to solve the problem or improve the existing situation?

API changes: Restful:

GRPC: rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse) Add span_kind to the rpc request and response

message GetOperationsRequest {
  string service = 1;
  string span_kind = 2;
}

message Operation{
    string name = 1;
    string span_kind = 2;
}

message GetOperationsResponse {
  repeated OperationMeta operations = 1;
}

Interfaces: spanstore:

...
// Get operations by service and spanKind, empty spanKind means get operations for all kinds of span
GetOperations(ctx context.Context, query *spanstore.OperationQueryParameters) ([]*storage_v1.OperationMeta, error)
..

Span reader will need to take extra parameter to query operations by service name and span kind. All the storage plugins needs to be updated so that:

Task List

burmanm commented 4 years ago

Couple of questions:

And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?

objectiser commented 4 years ago

@guo0693 Agree with @burmanm - why just restrict to spanKind, when can support a list of tags, of which span.kind could be one that is queried?

guo0693 commented 4 years ago

Couple of questions:

  • What happens to old data? It can't be queried this way if the information doesn't exist.
  • What defines the kind? The writing client? I don't see any links to the client changes, so is this something coming from for example OpenTelemetry?
  • Why not just a tag?

And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?

objectiser commented 4 years ago

@guo0693 Yes I think it would be better (more general) if was a tag at least (maybe not list for now) - rather than just implementing something for this specific use case.

yurishkuro commented 4 years ago

There are certain arguments in favor of doing this as a general extension with arbitrary tags, but there are also a lot of arguments against that:

  1. Do we know of any real use cases when tags other than spanKind would be useful? Remember that this specific API is primarily used to populate the list of operations for a given service. We don't have any plans to support any search capabilities over that data set
  2. The operationName corresponds to a class of spans (which is why we always insist on it being low cardinality). The span.kind tag belongs to the same category, but many other tags do not, instead they describe a specific instance of that class of spans (e.g. error=true tag).
  3. span.kind is prescribed by the OpenTracing specification to be always recorded, as a way of classifying the spans. Most other tags are not, or they are specific to certain types of spans, like http.* tags. OpenTelemetry goes even further and requires span kind as a span constructor argument (same as span name).
  4. Because span kind is so standardized, it can be included in the index automatically. For any other tag, we would need user-controlled configuration to tell us which tags to include (error=true would never make sense in that index).
  5. Implementing the storage of this data with a list of tags vs. a top-level spanKind field is a lot more complicated. For example, the writers have a dedicated cache used to avoid writing operation names all the time, since it's a fairly static data set. With a list of tags this cache gets more complicated.

Finally, the way the PR itself is shaping, we're extending the operation name in the index from a simple string to a struct. This will allow adding arbitrary tags functionality in the future if the compelling use case is found. My preference would be to wait for such use case to materialize before paying the cost of higher complexity, and only implement support for span kind now as a dedicated field, because of a special role it plays.

burmanm commented 4 years ago

Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.

To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?

guo0693 commented 4 years ago

Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.

To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?

To query traces that end up in failure is a valid query, which I believe currently searching traces by tags is already supported.

But this issue is searching operations based on service and spanKind. We have an ask from UI to only return service-operations from server spans so that when user select a service from the DDG page the operations dropdown list only shows server operations.

burmanm commented 4 years ago

Is this more related to the dependency structure then? Or what's the point of searching only the operations without traces?

On Wed, Nov 13, 2019, 23:10 Jun Guo notifications@github.com wrote:

Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.

To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?

To query traces that end up in failure is a valid query, which I believe currently searching traces by tags is already supported.

But this issue is searching operations based on service and spanKind. We have an ask from UI to only return service-operations from server spans so that when user select a service from the DDG page the operations dropdown list only shows server operations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jaegertracing/jaeger/issues/1920?email_source=notifications&email_token=AAEJLDVE6IEZUOLA24LJ7D3QTRULPA5CNFSM4JMLOU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7VEOA#issuecomment-553603640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEJLDWDGGDYQKHNBDBSIJLQTRULPANCNFSM4JMLOU2Q .

yurishkuro commented 4 years ago

@burmanm I cannot find this ticket now, but it's somewhere in this repo or the UI where people complained

"why do you I see all those weird span names in the Operations dropdown on the Search page? I only want to see the endpoints of my service, not all the internal / client span names".

That's what this feature is fixing - to be able to separate operations into "endpoints" vs. "the rest".

objectiser commented 4 years ago

My main objection was related to API design/consistency and mapping onto the model - service, operation and tags are first class concepts, whereas span.kind is not (i.e. it is a specific type of tag).

However given the transition to OpenTelemetry, and that span kind is a top level span field now (instead of an attribute/tag), I am ok with the approach.

yurishkuro commented 4 years ago

@guo0693 I don't think we addressed one design point, I thought I raised it somewhere in the comments.

Do we really need to support query by kind in the storage? Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did not have this requirement, then the gRPC API change could be 100% backwards compatible, the existing storage plugins would return a list of op names in field 1, and after they are upgraded they would start returning the list of Operation structs in a new field 2. And there would be a lot lest implementation changes in the storage engines, only to insert and retrieve an extra field, but not build any additional filtering. The filtering can be done by the QuerySvc.

guo0693 commented 4 years ago

Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did no

As of now, cassandra implementation did the filtering in storage layer using cql . If badger & ES does not support filtering in db query, we can put the filtering logic in a util class and retrieve operations of all span.kind from db and apply the filtering

yurishkuro commented 4 years ago

Please add a ticket & coordinate making the UI change to use the new endpoint on the Search screen and display the span kind. I recommend adding it as a prefix, e.g. (server) getUser, and ordering the entries in the dropdown so that server and consumer spans are shown first.

Also add a ticket to remove the old endpoint once the UI is upgraded.