not enough chunks returned/queried by/from cassandra

Dieterbe commented 8 years ago

./metric_tank --chunkspan 600 --numchunks 3

running graphite-watcher with this patch

+               then := time.Now().Add(-time.Duration(50) * time.Minute)
+               q.Start = &then

shows that only 2 chunks per row are returned. similar when opening a dashboard and querying the last hour of data. chunks per row is 3 and the first section of an incomplete chunk is not used. for example, for a query of last hour, at 12:11 there is no data from 11:11 until 11:20 last-hour

Dieterbe commented 8 years ago

i can do this cause i know AJ is working on the commitlog

Dieterbe commented 8 years ago

the problem is how we query cassandra:

query(start_month, "SELECT data FROM metric WHERE key = ? AND ts >= ? AND ts < ? ORDER BY ts ASC", row_key, start, end)

we need to query for ts >= start - (start % chunkSpan) but the problem is in the http handler we don't know what the chunkspan is. i see 2 solutions:

get the chunkspan from the in-memory chunks, either when iterating them, or by adding a function to query the AggMetric. both approaches are kludgy IMHO and assume the chunkspan remains constant. i think we may want to adjust chunkspans over time.
extend the cassandra schema to also have a "last ts" or a "span" field. this has a bit more overhead but seems more robust.

@woodsaj what do you think?

Dieterbe commented 8 years ago

or a 3rd option i guess, do an extra query for something like where ts <= start order by ts desc limit 1. or is there a way to get the first column also in 1 query?

woodsaj commented 8 years ago

It is not possible to change the cassandra schema.

Dieterbe commented 8 years ago

why not? do you have any suggestion that can bring us closer to a solution?

woodsaj commented 8 years ago

Cassandra is a columnar database. chunks are stored as columns in a row (unlike in a relationalDB where chunks would be stored in rows). So the column name is the chunks T0 and the column value is the binary blob of chunk data.

row_key	1448333580	1448333590	1448333600
series1_201511	1.1	1.4	3.0
series2_201511	55.0	22.0	55.0

Adding additional columns wouldnt work as we would no longer be able to do range queries across columns.

woodsaj commented 8 years ago

I think option3 is the best option.

woodsaj commented 8 years ago

So after more thought on this. Option 3 is by far the best option. The alternative is to keep an index of chunk spans, which we would need to query, then process to determine what the start time should be adjusted to. So adding just this small single query is preferable.

Dieterbe commented 8 years ago

turns out what i wanted to do can't really be done:

cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."
cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC LIMIT 1;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."

even if it were, we would have to take into account that the previous chunk we need could be in the row for a previous month. so we would have to try to find the first chunk first in the same month as start falls in, and if that doesn't yield a result, we have to get it from the previous month. this means possibly two cassandra queries in a sequence, one waiting for the other. too slow and kludgy to implement in the current code. i decided to just implement it the simple way for now by hardcoding a "chunkspans will never be longer than"-value set to 12h for now. we may want to make this value configurable or runtime-adjustable. i'm not happy about the loss of efficiency though :( @woodsaj let me know what you think of #70 or hopefully you see a better way.

grafana / metrictank

not enough chunks returned/queried by/from cassandra #60