Support lookup of 128-bit traces by lower 64-bits in cassandra3

codefromthecrypt commented 7 years ago

Currently, trace ids are stored as a variable-length number (as opposed to a 2 part hi/lo 128bit id). In order to support query by low 64 bits, we need to split this up somehow or add a separate index.

You'll notice SpanStoreTest.getTrace_retrieves128bitTraceIdByLower64Bits is disabled due to this.

cc @openzipkin/cassandra

codefromthecrypt commented 7 years ago

I started playing around with replacing varint (BigInteger) with a new TraceIdUDT (with hi/lo fields).

That worked, except I didn't get how to query by half of the traceId UDT considering traceId is a part of the partition key. Maybe this could be solved with SASI w/this or a different supporting 64-bit traceIdLo lookup field.

michaelsembwever commented 7 years ago

Possible approaches are 1) SASI based

CREATE TABLE IF NOT EXISTS zipkin3.traces (
    trace_id            varint,
    trace_id_low        bigint    STATIC,
    ts_uuid             timeuuid,
    id                  bigint,
    ts                  bigint,
    span_name           text,
    parent_id           bigint,
    duration            bigint,
    annotations         list<frozen<annotation>>,
    binary_annotations  list<frozen<binary_annotation>>,
    all_annotations     text, //-- can't do SASI on set<text>: comma-joined until CASSANDRA-11182
    PRIMARY KEY (trace_id, ts_uuid, id)
)
    WITH CLUSTERING ORDER BY (ts_uuid DESC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'}
    AND default_time_to_live =  604800;

CREATE CUSTOM INDEX ON zipkin3.traces (trace_id_low) USING 'org.apache.cassandra.index.sasi.SASIIndex'
   WITH OPTIONS = {'mode': 'SPARSE'};

or 2) clustering key based

CREATE TABLE IF NOT EXISTS zipkin3.traces (
    trace_id_low        bigint,
    trace_id            varint,
    ts_uuid             timeuuid,
    id                  bigint,
    ts                  bigint,
    span_name           text,
    parent_id           bigint,
    duration            bigint,
    annotations         list<frozen<annotation>>,
    binary_annotations  list<frozen<binary_annotation>>,
    all_annotations     text, //-- can't do SASI on set<text>: comma-joined until CASSANDRA-11182
    PRIMARY KEY (trace_id_low, ts_uuid, trace_id, id)
)
    WITH CLUSTERING ORDER BY (ts_uuid DESC)
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy'}
    AND default_time_to_live =  604800;

michaelsembwever commented 7 years ago

Approach (2) would mean always querying by the low-bits (trace_id_low), and filtering out rows that have a non-matching (64 or 128-bit) trace_id. This is an effective approach during the transitional mixed stage.

Approach (1) relies on the sparse b-tree index. Using sparse mode should help it be effective during the transitional mixed stage but depends on low cardinality of 128-bit id 64-bit id. When the transitional mixed stage is over then the trace_id_low column and the index can be dropped without loosing any data. There's going to be some overhead in the number of queries having to execute during the transitional mixed stage (the trade-off compared to having to filter out the non-matching trace_ids in (1)).

michaelsembwever commented 7 years ago

A third approach, 3) is to store the trace under two IDs without changing the schema. These two IDs are the original 128-bit trace_id along with the low-64-bit trace_id.

codefromthecrypt commented 7 years ago

we'd assume 3) doubles the storage right?

michaelsembwever commented 7 years ago

we'd assume 3) doubles the storage right?

Kinda. Zero-high-bit 128-bit trace ids are the same as their 64-bit counterparts and therefore would not be stored twice.

codefromthecrypt commented 7 years ago

Separate issue, but related. This reminds me that there is a bit of a quirk when representing the 128-bit trace id with a single number (as opposed to two longs or a byte array). Basically, it is harder to tell if the high bits are unset (since it is a signed variable length number). In hindsight I'd prefer some fixed length binary, or dual-long holder form.

michaelsembwever commented 7 years ago

What's the verdict here? Happy to code if direction is agreed…

codefromthecrypt commented 7 years ago

I like 2 best considering the transition will be quite long and there's little likelihood of a large filtering need since clashing especially within a timeframe would be rare. eg client-side filtering should be fine.

codefromthecrypt commented 7 years ago

I would actually prefer a variant which removes all use of big integer. BigInteger is hard to reason with from a bit-length perspective as it is a signed number. It is harder to use than the normal representations of zipkin (hi|lo long trace id or hex trace id), and there's already a bug I found which was annoying to fix.

Can we please change cassandra to use either (long traceId, long traceIdHigh) or hex string traceId?

codefromthecrypt commented 7 years ago

sorry earlier I said I like 1, when I meant 2.. corrected my comment

michaelsembwever commented 7 years ago

@adriancole Thinking more about this, with the code open in my editor, I'm in favour of (3).

The simplicity of (3) is definitely there, and the extra rows (the extra storage required) only happen when true 128-bit IDs are consumed.

If also leaves us in the position where the schema doesn't need to change from migration to final phase.

Checkout the following… https://github.com/openzipkin/zipkin/tree/mck/cassandra-128-bit

But it does not provide the forward-compatibility of query-api/UI to storage, as dictated by the CassandraSpanStoreTest#getTrace_retrieves128bitTraceIdByLower64Bits and CassandraSpanStoreTest#getTrace_retrieves128bitTraceIdByLower64Bits_mixed` tests.

codefromthecrypt commented 7 years ago

hey, mick I like the idea of 3) particularly from the schema impact perspective.

The extra cost of storage should be ok for a lot of sites, but we need to be careful about things that merge or aggregate. (ex. we don't want to create duplicate dependency links, mess up limit results etc). I have some other tests in progress locally and certainly will keep this point in mind.

I think the most precise is to add another field to index on, but I agree it makes the schema more gnarly long term.. in other words, I think you've probably the right choice here, but we should look at its impact closely.

In the mean time, I've pulled out most of the change, just as one to switch ids to fixed length(high/low). Regardless of what we choose here, I think we should make the move to fixed length trace identifiers.

https://github.com/openzipkin/zipkin/pull/1394

codefromthecrypt commented 7 years ago

1.15 does the simple option which is to store the span twice. Besides the storage impact (which is only one during the transition), the only thing people might notice is that the json only includes 64-bit trace ids (even when instrumentation sent 128-bit ones)

from CassandraStrictTraceIdFalseTest

  /**
   * When {@link StorageComponent.Builder#strictTraceId(boolean)} is true and {@link
   * Span#traceIdHigh} is not zero, the span is stored a second time, with {@link Span#traceId}
   * zero. This allows spans to be looked up by the low bits of the trace ID at the cost of extra
   * storage. When spans are retrieved by {@link Span#traceIdHigh} as zero, they are returned as
   * {@link Span#traceIdHigh} zero because unlike the old schema, the original structs are not
   * persisted.
   */
  @Test
  @Override
  public void getTrace_retrieves128bitTraceIdByLower64Bits_mixed() {

openzipkin / zipkin

Support lookup of 128-bit traces by lower 64-bits in cassandra3 #1364