openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
16.95k stars 3.09k forks source link

Search by dependency link #1206

Open codefromthecrypt opened 8 years ago

codefromthecrypt commented 8 years ago

One thing @yurishkuro mentioned in his presentation was that we don't support search by multiple services out of the box. This is interesting, because we do actually support search without a service.

The use-case is moving from a link point of view (ex an edge on a dependency tree) to a trace query.

We can investigate how search by service works on a per-datastore basis, as well the UI affects of such a thing.

calm2016 commented 7 years ago

In microservice applications, searching among all services is really useful and required. Please consider to at least allow "all" keywords to indicate searching in all services. Btw, it would be great if simple regular expression is supported, such as xx\ in the annotation/key-value search function. Thanks

codefromthecrypt commented 7 years ago

currently the api supports leaving out the serviceName parameter. http://zipkin.io/zipkin-api/#/paths/%252Ftraces/get/parameters/serviceName

regex isn't likely to work across all storage options server-side

codefromthecrypt commented 6 years ago

all service query is out in 2.3

yurishkuro commented 6 years ago

@adriancole does this work in either of the Cassandra implementations? Looks like it might in v2 because of zipkin2.span (annotation_query) and zipkin2.trace_by_service_span (duration) SASI indices, but v1 uses service name as a PK field iirc.

codefromthecrypt commented 6 years ago

@adriancole https://github.com/adriancole does this work in either of the Cassandra implementations? Looks like it might in v2 because of zipkin2.span (annotation_query) and zipkin2.trace_by_service_span (duration) SASI indices, but v1 uses service name as a PK field iirc.

It works (everything works) in "cassandra3". In "cassandra" it is expensive as it does a fan-out across services.

yurishkuro commented 6 years ago

ah, fan-out, nice trick. Not an option for us, unfortunately (3k services).

thanks.

codefromthecrypt commented 6 years ago

ah, fan-out, nice trick. Not an option for us, unfortunately (3k services).

I'll raise a PR to minimize the impact of this for those using cassandra and not yet upgraded.. If someone is just clicking search, a cheaper way is to scan the traces table until you've collected enough trace ids.

codefromthecrypt commented 6 years ago

actually, we can't do this in the legacy impl because we don't have any secondary index on the traces table (because we need to support 2.2+ which has no SASI)

Those who use cassandra and want more efficient all-services query will need to upgrade I think... cc @openzipkin/cassandra in case I'm mistaken

codefromthecrypt commented 6 years ago

re-opening as the root question was about searching by dependency link

https://github.com/openzipkin/openzipkin.github.io/wiki/2018-07-02-Dependency-Link-Query-at-Ascend is a design in progress

codefromthecrypt commented 4 years ago

Where @llinder @zeagord and I ended up was that we needed a way to choose which trace IDs would be associated with a period represented by a dependency link. Otherwise, you have to store rows for every pair related to that link. In other words, the bounds force you to choose somehow and we stopped at that.

What happened since, was some backends have the ability to store the metric rows associated with a link, for example https://github.com/jeqo/zipkin-storage-kafka or the https://github.com/adriancole/zipkin-voltdb. This means that at query time, you could choose exemplars which are more representative and not as biased as trying to do that on each host prior to reporting.

Aside as link picking is not that much different than normal time series picking:

Long long ago, the first version of zipkin had favorite functionality. This was basically manual picking which could be arbitrary or to mark something exemplary. We've had a ton of various thought works happening around this topic since, and in many areas. Below is only a survey of the first that come to mind.

I think Jaana actually started the fire on people wanting to use exemplars also for arbitrary metric points https://medium.com/observability/want-to-debug-latency-7aa48ecbe8f7

One idea we had in micrometer support was to do try to keep a configuration which could decide what is exemplary and use evaluate local histograms towards that. Lacking that, we could certainly just pick randomly rate-limited.

Later, we learned about haystack, which punts all the way down the line, and use a decoration service: adaptive alerting. In other words, they don't pretend to know what's exemplary up-front. Instead, they use some sort of model selection to do that and update your exemplar store somehow.

Last year, someone told me excitedly about their zipkin exemplar integration.. It looked basically the same as Jaana's pictures. Grafana with zipkin trace exemplars co-plotted. However, it took significant hacking due to the lack of integration in grafana etc. Here, we discussed with LINE the same topic and resolved to park the grafana idea as their api was supposed to change dramatically to formally support things without massive hacks.

All this said, I noticed some projects are picking up similar work again, ex jaeger exemplar support, I figured I'd recap where we got on links and picking favorites has significant overlap with doing the same for arbitrary metrics derived spans and picking favorites (exemplars).

hope this helps any lurkers or anyone here that wants to try again.