want to add time_bucket on every Cassandra index table

Sasasu commented 5 years ago

Requirement - what kind of business use case are you trying to solve?

We use Cassandra as storage backend and find two bad indexes: service_operation_index and tag_index these indexes will create a very very long row on one Cassandra node, which significantly increases the load of this node.

https://github.com/jaegertracing/jaeger/blob/fb5505005a21f007792dedbcd2ad49484d1d587e/plugin/storage/cassandra/schema/v002.cql.tmpl#L114 every trace id associated with this service name and operation name will be on one row

https://github.com/jaegertracing/jaeger/blob/fb5505005a21f007792dedbcd2ad49484d1d587e/plugin/storage/cassandra/schema/v002.cql.tmpl#L171 please consider the error=true span.kind=server and span.kind=client tag

Problem - what in Jaeger blocks you from solving the requirement?

bad performance.

Proposal - what do you suggest to solve the problem or improve the existing situation?

add time bucket on these indexes.

PRIMARY KEY ((service_name, operation_name, bucket), start_time) PRIMARY KEY ((service_name, tag_key, tag_value, bucket), start_time, trace_id, span_id)

In fact, jaeger UI only can search by 1 hour. time bucket can be 1 hour. If the query is "30day-ago limit 10", it can be converted into at most 720 * "select ... limit 10", I think this is acceptable. the index will be PRIMARY KEY ((service_name, operation_name, bucket), trace_id)

this second index can be ((service_name, tag_key, tag_value, bucket), duration, start_time, trace_id, span_id) this index can support search by tag and sort by duration.

I want to solve this and send a PR. Did anyone try to solve this problem?

yurishkuro commented 5 years ago

When we extended our retention window we also starting to get Cassandra errors 'partition too large'. I haven't thought much about using time buckets for these indices, it could be reasonable, but it does explode the number of queries, which could have the opposite effect of what you're trying to do (not to mention that you will need to dedupe the results). So far the approach we mainly used was to use an arbitrary bucket differentiator, which I think defaults to 10, but ideally should be tuned depending on the number of hosts in the cluster.

Sasasu commented 5 years ago

a const number can not guarantee this primary key is on different Cassandra node. I will do some experiment and benchmark.

jaegertracing / jaeger