Closed Dieterbe closed 8 years ago
i can do this cause i know AJ is working on the commitlog
the problem is how we query cassandra:
query(start_month, "SELECT data FROM metric WHERE key = ? AND ts >= ? AND ts < ? ORDER BY ts ASC", row_key, start, end)
we need to query for ts >= start - (start % chunkSpan)
but the problem is in the http handler we don't know what the chunkspan is.
i see 2 solutions:
@woodsaj what do you think?
or a 3rd option i guess, do an extra query for something like where ts <= start order by ts desc limit 1
. or is there a way to get the first column also in 1 query?
It is not possible to change the cassandra schema.
why not? do you have any suggestion that can bring us closer to a solution?
Cassandra is a columnar database. chunks are stored as columns in a row (unlike in a relationalDB where chunks would be stored in rows). So the column name is the chunks T0 and the column value is the binary blob of chunk data.
row_key | 1448333580 | 1448333590 | 1448333600 |
---|---|---|---|
series1_201511 | 1.1 | 1.4 | 3.0 |
series2_201511 | 55.0 | 22.0 | 55.0 |
Adding additional columns wouldnt work as we would no longer be able to do range queries across columns.
I think option3 is the best option.
So after more thought on this. Option 3 is by far the best option. The alternative is to keep an index of chunk spans, which we would need to query, then process to determine what the start
time should be adjusted to. So adding just this small single query is preferable.
turns out what i wanted to do can't really be done:
cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."
cqlsh> select key, ts from raintank.metric where ts < 1449202000 ORDER by ts DESC LIMIT 1;
InvalidRequest: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."
even if it were, we would have to take into account that the previous chunk we need could be in the row for a previous month. so we would have to try to find the first chunk first in the same month as start falls in, and if that doesn't yield a result, we have to get it from the previous month. this means possibly two cassandra queries in a sequence, one waiting for the other. too slow and kludgy to implement in the current code. i decided to just implement it the simple way for now by hardcoding a "chunkspans will never be longer than"-value set to 12h for now. we may want to make this value configurable or runtime-adjustable. i'm not happy about the loss of efficiency though :( @woodsaj let me know what you think of #70 or hopefully you see a better way.
./metric_tank --chunkspan 600 --numchunks 3
running graphite-watcher with this patch
shows that only 2 chunks per row are returned. similar when opening a dashboard and querying the last hour of data. chunks per row is 3 and the first section of an incomplete chunk is not used. for example, for a query of last hour, at 12:11 there is no data from 11:11 until 11:20