jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.46k stars 2.44k forks source link

[Feature]: time parameters for querying trace by id #4150

Open alburthoffman opened 1 year ago

alburthoffman commented 1 year ago

Requirement

add start_time and end_time as optional parameters in https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/query.proto#L37

Problem

when using with Granafa console, the trace id view takes a little long time to load.

image

This is because the query has to scan all traces in the db. we have about 4M sampled traces per day and the data is growing, and we store several days data in the database.

Proposal

it would be good to add time parameters when querying by trace id. like tempo API https://grafana.com/docs/tempo/latest/api_docs/#query.

grafana console already have the time window.

Open questions

No response

yurishkuro commented 1 year ago

What storage backend are you using?

I'm not particularly opposed to this change. Even though I do not recall similar complaints, I assume if someone uses ES as storage with indices rotated regularly, then having a time range as a hint might narrow down which indices to query. However, I am not sure how that would work when people use index alias, it would require some kind of support on ES side to use the time range hint.

Other official Jaeger backends are kv-stores and don't need help looking up by trace ID.

alburthoffman commented 1 year ago

our backend is clickhouse. Clickhouse is not good at querying item by id, especially when the id is actually a random number.

We simply have too many traces, about 4 billion traces per day. that would need lots of memory if holding them in memory.

for kv store, it can simply ignore the start and end time.

this feature sounds like more general case as the API assume global scanning without any other options.

vjsamuel commented 1 year ago

@yurishkuro, this would greatly benefit us given the volume of spans we ingest into our click house cluster. we would be more than happy to contribute the enhancement as well.

yurishkuro commented 1 year ago

go for it.

Speaking of ClickHouse: are you using https://github.com/jaegertracing/jaeger-clickhouse ? I recently opened https://github.com/jaegertracing/jaeger/issues/4196. There is another implementation in OTel Collector Contrib, which has an additional small table, as I understand for looking up time range based on trace-id

alburthoffman commented 1 year ago

@yurishkuro Thx for the information. we had evaluated the index table solution long ago. It does not help too much as the trace id is a random number. Clickhouse index does not help with random numbers.

When the traffic volumn is not high, the index table solution can be used.

I will work on the time parameters and send the PR. Thx