Open guo0693 opened 4 years ago
Couple of questions:
And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?
@guo0693 Agree with @burmanm - why just restrict to spanKind
, when can support a list of tags, of which span.kind
could be one that is queried?
Couple of questions:
- What happens to old data? It can't be queried this way if the information doesn't exist.
- What defines the kind? The writing client? I don't see any links to the client changes, so is this something coming from for example OpenTelemetry?
- Why not just a tag?
And of course tasteful thing, but "spanKind" inside Process doesn't really feel descriptive to me. Also, what is a server and what is a client? Is a request made by the first server to next server a client or a server?
span.kind
tag. Thinking from schema point of view:
service_name | operation_name | span_kind
--------------------------------------------
"serviceA" | "operationA"|"server"
Are you proposing we should have below schema?
service_name | operation_name | tag
--------------------------------------------
"serviceA" | "operationA"|"span.kind=server"
@objectiser currently the UI only need span.kind
to get a list of server
operations to enrich dropdown list in the UI. Not sure if you also suggest a tag
column with possible value to be "{tagkey}={tagvalue}", and in the span writer side maybe define a list of indexableOperationTags, for each of those indexable tag in a span, insert a row in the table.
"serviceA" | "operationA"|"{tag1=val1}"
"serviceA" | "operationA"|"{tag2=val2}"
@guo0693 Yes I think it would be better (more general) if was a tag at least (maybe not list for now) - rather than just implementing something for this specific use case.
There are certain arguments in favor of doing this as a general extension with arbitrary tags, but there are also a lot of arguments against that:
operationName
corresponds to a class of spans (which is why we always insist on it being low cardinality). The span.kind
tag belongs to the same category, but many other tags do not, instead they describe a specific instance of that class of spans (e.g. error=true
tag).span.kind
is prescribed by the OpenTracing specification to be always recorded, as a way of classifying the spans. Most other tags are not, or they are specific to certain types of spans, like http.*
tags. OpenTelemetry goes even further and requires span kind as a span constructor argument (same as span name).error=true
would never make sense in that index).spanKind
field is a lot more complicated. For example, the writers have a dedicated cache used to avoid writing operation names all the time, since it's a fairly static data set. With a list of tags this cache gets more complicated.Finally, the way the PR itself is shaping, we're extending the operation name in the index from a simple string to a struct. This will allow adding arbitrary tags functionality in the future if the compelling use case is found. My preference would be to wait for such use case to materialize before paying the cost of higher complexity, and only implement support for span kind now as a dedicated field, because of a special role it plays.
Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true
not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.
To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?
Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't
error=true
not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?
To query traces
that end up in failure is a valid query, which I believe currently searching traces by tags is already supported.
But this issue is searching operations
based on service and spanKind. We have an ask from UI to only return service-operations from server spans so that when user select a service from the DDG page the operations dropdown list only shows server operations.
Is this more related to the dependency structure then? Or what's the point of searching only the operations without traces?
On Wed, Nov 13, 2019, 23:10 Jun Guo notifications@github.com wrote:
Based on the support tickets for badger backend, a lot of them have been about tags searching. Sadly most of them have included censored tags, so the actual names are not known. But they have included multi-tags searching. Also, why wouldn't error=true not make sense? I think it sounds like a great filtering when you're interested in traces that end up in failure.
To me this ticket sounds like a limitation of Cassandra backend - and that leaking to the generic API feels weird. All the tags are currently indexed in Badger backend and I would imagine in the ES also. If the idea is to limit the search space to only configured list of tags, why not make that instead?
To query traces that end up in failure is a valid query, which I believe currently searching traces by tags is already supported.
But this issue is searching operations based on service and spanKind. We have an ask from UI to only return service-operations from server spans so that when user select a service from the DDG page the operations dropdown list only shows server operations.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jaegertracing/jaeger/issues/1920?email_source=notifications&email_token=AAEJLDVE6IEZUOLA24LJ7D3QTRULPA5CNFSM4JMLOU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED7VEOA#issuecomment-553603640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEJLDWDGGDYQKHNBDBSIJLQTRULPANCNFSM4JMLOU2Q .
@burmanm I cannot find this ticket now, but it's somewhere in this repo or the UI where people complained
"why do you I see all those weird span names in the Operations dropdown on the Search page? I only want to see the endpoints of my service, not all the internal / client span names".
That's what this feature is fixing - to be able to separate operations into "endpoints" vs. "the rest".
My main objection was related to API design/consistency and mapping onto the model - service, operation and tags are first class concepts, whereas span.kind
is not (i.e. it is a specific type of tag).
However given the transition to OpenTelemetry, and that span kind is a top level span field now (instead of an attribute/tag), I am ok with the approach.
@guo0693 I don't think we addressed one design point, I thought I raised it somewhere in the comments.
Do we really need to support query by kind in the storage? Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did not have this requirement, then the gRPC API change could be 100% backwards compatible, the existing storage plugins would return a list of op names in field 1, and after they are upgraded they would start returning the list of Operation structs in a new field 2. And there would be a lot lest implementation changes in the storage engines, only to insert and retrieve an extra field, but not build any additional filtering. The filtering can be done by the QuerySvc.
Yes, storage needs to return the spanKind, but why can't it just return ALL operations for a service? If we did no
As of now, cassandra implementation did the filtering in storage layer using cql . If badger & ES does not support filtering in db query, we can put the filtering logic in a util class and retrieve operations of all span.kind from db and apply the filtering
Please add a ticket & coordinate making the UI change to use the new endpoint on the Search screen and display the span kind. I recommend adding it as a prefix, e.g. (server) getUser
, and ordering the entries in the dropdown so that server
and consumer
spans are shown first.
Also add a ticket to remove the old endpoint once the UI is upgraded.
Requirement - what kind of business use case are you trying to solve?
We want to filter the operations of a service by span kind, those filtered operations can be used for features e.g. get dependencies graph for service:operation pair from server side (spanKind == "server")
Problem - what in Jaeger blocks you from solving the requirement?
Currently we only save operations indexed by service name and we can only query operations by service name. We will need to add extra info: spanKind when saving operations for a service.
Proposal - what do you suggest to solve the problem or improve the existing situation?
API changes: Restful:
GET /api/services/{serviceName}/operations : We will keep this endpoint unchanged for backward compatibility, it will still return all operations as a list of string for a given service.
GET /api/operations?service={service}&spanKind={spanKind} We will update this endpoint to return list of operation metadata for a given service whose spanKind matched the query parameter. If spanKind is empty, then return all operations of a service. Sample response:
GRPC: rpc GetOperations(GetOperationsRequest) returns (GetOperationsResponse) Add span_kind to the rpc request and response
Interfaces: spanstore:
Span reader will need to take extra parameter to query operations by service name and span kind. All the storage plugins needs to be updated so that:
Task List
Any open questions to address