raintank / graphite-kairosdb

Graphite-Api finder plugin for Kairosdb
Apache License 2.0
10 stars 4 forks source link

how do graphite queries map to cassandra reads? #4

Closed Dieterbe closed 9 years ago

Dieterbe commented 9 years ago

an alerting job queries graphite-api for sum(litmus.{{.EndpointSlug}}.*.{{.MonitorTypeName | ToLower }}.error_state) does this map to 1 query (getting all the series in one shot) to kairosdb/cassandra, or a query for every single series separately?

also, how does this translate to cassandra.db.read_count ?

Dieterbe commented 9 years ago

for what it's worth, currently:

MariaDB [grafana]> select sum(1/frequency) from monitor where enabled = 1;
22.5500

i think, on average we should be doing 22.5 alert jobs per second need to figure out how to reliably get the number of collectors for each monitor, that should give us a pretty good indication of how many series we're querying

woodsaj commented 9 years ago

the litmus.{{.EndpointSlug}}.*.{{.MonitorTypeName | ToLower }}.error_state query is send to elasticsearch. Elastic will then return all of the series that match the regex. We then construct a single query to kairosdb requesting all of these series.

However kairosdb is tag based, so the query we send must include a list of measurementNames+tags. Foreach of these items, kairosdb needs to query its own tag index to identify the rows that should be fetched.

I am experimenting having graphite query cassandra directly, which should reduce query overheads. https://github.com/raintank/graphite-kairosdb/issues/5

woodsaj commented 9 years ago

Now that graphite_kairosdb is fetching data directly from cassandra instead of via kairosdb, this is how queries are mapped.

graphite_api calls kairosdbFinder.find_nodes(query) with the query from the user. This query is then converted to a regex and used to search for matching metrics in elasticsearch the reponse from elasticsearch is processed to identify which results are branches and which are leaf nodes.

graphite_api then calls kairosdbFinder.fetch_multi(nodes, start, end) where nodes is the set of leaf nodes returned from the previous call. For each node, we construct 1 or more cassandra queries depending on how many 3week periods the time range spans. We then send all of the queries to cassandra in parallel using an async query.

the response(s) from Cassandra are then processed and returned to graphite in the format expected.

Dieterbe commented 9 years ago

neat. thanks AJ.