grafana / metrictank

metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
GNU Affero General Public License v3.0
622 stars 104 forks source link

not enough chunks returned/queried by/from cassandra #60

Closed Dieterbe closed 8 years ago

Dieterbe commented 8 years ago

./metric_tank --chunkspan 600 --numchunks 3

running graphite-watcher with this patch

+               then := time.Now().Add(-time.Duration(50) * time.Minute)
+               q.Start = &then

shows that only 2 chunks per row are returned. similar when opening a dashboard and querying the last hour of data. chunks per row is 3 and the first section of an incomplete chunk is not used. for example, for a query of last hour, at 12:11 there is no data from 11:11 until 11:20 last-hour

Dieterbe commented 8 years ago

i can do this cause i know AJ is working on the commitlog

Dieterbe commented 8 years ago

the problem is how we query cassandra:

query(start_month, "SELECT data FROM metric WHERE key = ? AND ts >= ? AND ts < ? ORDER BY ts ASC", row_key, start, end)

we need to query for ts >= start - (start % chunkSpan) but the problem is in the http handler we don't know what the chunkspan is. i see 2 solutions:

@woodsaj what do you think?

Dieterbe commented 8 years ago

or a 3rd option i guess, do an extra query for something like where ts <= start order by ts desc limit 1. or is there a way to get the first column also in 1 query?

woodsaj commented 8 years ago

It is not possible to change the cassandra schema.

Dieterbe commented 8 years ago

why not? do you have any suggestion that can bring us closer to a solution?

woodsaj commented 8 years ago

Cassandra is a columnar database. chunks are stored as columns in a row (unlike in a relationalDB where chunks would be stored in rows). So the column name is the chunks T0 and the column value is the binary blob of chunk data.

row_key 1448333580 1448333590 1448333600
series1_201511 1.1 1.4 3.0
series2_201511 55.0 22.0 55.0

Adding additional columns wouldnt work as we would no longer be able to do range queries across columns.

woodsaj commented 8 years ago

I think option3 is the best option.

woodsaj commented 8 years ago

So after more thought on this. Option 3 is by far the best option. The alternative is to keep an index of chunk spans, which we would need to query, then process to determine what the start time should be adjusted to. So adding just this small single query is preferable.

Dieterbe commented 8 years ago

turns out what i wanted to do can't really be done:

cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."
cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC LIMIT 1;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."

even if it were, we would have to take into account that the previous chunk we need could be in the row for a previous month. so we would have to try to find the first chunk first in the same month as start falls in, and if that doesn't yield a result, we have to get it from the previous month. this means possibly two cassandra queries in a sequence, one waiting for the other. too slow and kludgy to implement in the current code. i decided to just implement it the simple way for now by hardcoding a "chunkspans will never be longer than"-value set to 12h for now. we may want to make this value configurable or runtime-adjustable. i'm not happy about the loss of efficiency though :( @woodsaj let me know what you think of #70 or hopefully you see a better way.