openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
16.97k stars 3.09k forks source link

Consider switching cassandra3 to span2 model #1695

Closed codefromthecrypt closed 5 years ago

codefromthecrypt commented 7 years ago

The cassandra3 type isn't finalized, so we have an opportunity to drop support for complexity around binary annotations and nested endpoints. This might result in the ability of using SASI to index service names (as they are fields in the span now as opposed to nested parts of binary annotation or annotation).

I don't know how well this would go with indexing, but here are the salient changes.

CREATE TYPE IF NOT EXISTS zipkin3.annotation (
    timestamp bigint,
    value  text
);

CREATE TABLE IF NOT EXISTS zipkin3.traces (
    trace_id            frozen<trace_id>,
    ts_uuid             timeuuid,
    parent_id           bigint,
    id                  bigint,
    timestamp           bigint,
    name                text,
    duration            bigint,
    local_endpoint      frozen<Endpoint>,
    remote_endpoint     frozen<Endpoint>,
    annotations         list<frozen<annotation>>,
    tags                frozen<map<string,string>>,
    shared              boolean,
    annotation_query    text, //-- can't do SASI on set<text>: comma-joined until CASSANDRA-11182
    PRIMARY KEY (trace_id, ts_uuid, id)
)

cc @openzipkin/cassandra

codefromthecrypt commented 7 years ago

note annotation_query would be shorter per span as it only needs to concat annotation and tag pairs (as opposed to also putting in the service name, because the local_endpoint covers this part)

michaelsembwever commented 7 years ago

note annotation_query would be shorter per span as it only needs to concat annotation and tag pairs (as opposed to also putting in the service name, because the local_endpoint covers this part)

i don't get this. when we search against that index we still are searching for local_serviceName:annotation or local_serviceName;tag_key;tag_value.

codefromthecrypt commented 7 years ago

i don't get this. when we search against that index we still are searching for local_serviceName:annotation or local_serviceName;tag_key;tag_value.

I mean to say that the serviceName can be refined independent of the ball of tags. for example, span.local_serviceName = foo filtered before annotationQuery. So instead of redundantly encoding the service name in the annotationQuery concatenation, access it as a field.

My assumption is that this could be more efficient, but might be wrong. At any rate, it would be easier to read.

Make sense?

michaelsembwever commented 7 years ago

Make sense?

Yup. I've put the extra column in and the extra SASI on it, so to achieve this. It could well be that two SASI indexes working together like this is not faster than having the bigger single SASI. Benchmarking later in the PR will prove that.

The branch has been updated, the main code looks roughly in shape, next to do is get the tests compiling and working.

shakuzen commented 5 years ago

Resolved via #1758