Open Sasasu opened 5 years ago
When we extended our retention window we also starting to get Cassandra errors 'partition too large'. I haven't thought much about using time buckets for these indices, it could be reasonable, but it does explode the number of queries, which could have the opposite effect of what you're trying to do (not to mention that you will need to dedupe the results). So far the approach we mainly used was to use an arbitrary bucket
differentiator, which I think defaults to 10, but ideally should be tuned depending on the number of hosts in the cluster.
a const number can not guarantee this primary key is on different Cassandra node. I will do some experiment and benchmark.
Requirement - what kind of business use case are you trying to solve?
We use Cassandra as storage backend and find two bad indexes:
service_operation_index
andtag_index
these indexes will create a very very long row on one Cassandra node, which significantly increases the load of this node.https://github.com/jaegertracing/jaeger/blob/fb5505005a21f007792dedbcd2ad49484d1d587e/plugin/storage/cassandra/schema/v002.cql.tmpl#L114 every trace id associated with this service name and operation name will be on one row
https://github.com/jaegertracing/jaeger/blob/fb5505005a21f007792dedbcd2ad49484d1d587e/plugin/storage/cassandra/schema/v002.cql.tmpl#L171 please consider the
error=true
span.kind=server
andspan.kind=client
tagProblem - what in Jaeger blocks you from solving the requirement?
bad performance.
Proposal - what do you suggest to solve the problem or improve the existing situation?
add time bucket on these indexes.
PRIMARY KEY ((service_name, operation_name, bucket), start_time)
PRIMARY KEY ((service_name, tag_key, tag_value, bucket), start_time, trace_id, span_id)
In fact, jaeger UI only can search by 1 hour. time bucket can be 1 hour. If the query is "30day-ago limit 10", it can be converted into at most 720 * "select ... limit 10", I think this is acceptable. the index will be
PRIMARY KEY ((service_name, operation_name, bucket), trace_id)
this second index can be
((service_name, tag_key, tag_value, bucket), duration, start_time, trace_id, span_id)
this index can support search by tag and sort by duration.I want to solve this and send a PR. Did anyone try to solve this problem?