grafana / metrictank

metrics2.0 based, multi-tenant timeseries store for Graphite and friends.
GNU Affero General Public License v3.0
623 stars 105 forks source link

rollups. #21

Closed Dieterbe closed 8 years ago

Dieterbe commented 8 years ago

the time has come.

Dieterbe commented 8 years ago

the way i see it, before coming up with proper rollup settings, it's a good practice to first list the desires and be aware of each of them, because this is where we may have different opinions: (and i think maybe i forgot a few considerations that @torkelo or @woodsaj may think of)

woodsaj commented 8 years ago

The two key drivers for rollups are:

  1. improve loading times of graphs that span large timeframes.
  2. reduce storage requirement

right now, item 1 is of higher priority then item2.

In addition to the requirements listed above, i would add

I am not sure if an additional NSQ consumer is the right approach.

In order to perform rollups, we are going to need to buffer metrics in memory. Querying this inMemory data is going to be faster then querying the TSDB, and so we should have graphite leverage it. In which case we can delay when we write to C*

So, my view is that we read from NSQ, then just write to the inMemory store. We then have another process or thread that uses this inMemory store to:

Dieterbe commented 8 years ago

Querying this inMemory data is going to be faster then querying the TSDB, and so we should have graphite leverage it

how sure are we that there is a significant difference in response time for serving read requests for recent data? or maybe this question is not that relevant because if we're going to batch up writes (and it looks like we should) then we need this in-memory component to satisfy hot data anyway.

i agree with your appoach, but the way i envision this, the nsq reading, inMemory store, and C* flushing can all happen in the same process, until we reach scalability limits.

Dieterbe commented 8 years ago

currently prototyping this

Dieterbe commented 8 years ago

progress is at https://github.com/raintank/raintank-metric/tree/tank https://github.com/raintank/raintank-docker/tree/tank

Dieterbe commented 8 years ago

(https://github.com/raintank/raintank-metric/tree/tank/nsq_metrics_tank has some details)

next up:

  1. a modified graphite-kairosdb to query nsq_metric_tank for data no less then a configurable amount of seconds old (chunkSpan*(numChunks-1)) would be nice. @woodsaj can you give this a shot?
  2. send many more metrics, validate http output looks good with larger chunk sizes. I think having (1) in dev stack would help here. check cpu/ram usage, response times.
  3. implement rollups
  4. implement saving to cassandra. i know very little about cassandra and i wonder if @woodsaj is also interested in giving this a shot. https://github.com/gocql/gocql looks like the best go library for cassandra but i read somewhere that CQL is not that mature and that "old style" is often still better (though not sure what that means). anyway just some code that connects to C* and can save new chunks to specific per-metric rows or something would be nice, i can then glue it into the daemon.
torkelo commented 8 years ago

nice progress! is nsq_metrics_tank a separate process or is part of raintank-metric process that feeds on NSQ.

Just curios of the topology / processes involved the the metric stack. has NSQ completely replaced RabbitMq? What process receives metrics from collectors and batches them onto NSQ, and then there is another process that receives them and saves meta data to elastic and metrics to cassandra?

Dieterbe commented 8 years ago

every nsq_* app in this repo runs as a service. there is no more raintank-metric process. nothing in here feeds to nsq, they all consume from nsq. there is one for maintaining the metric definitions in ES, one for storing probe events in ES, one for saving metrics to kairos. and now this new one, which should eventually also replace the latter.

we still use rabbitmq for the grafana app bus and perhaps a few other things, not sure. but nsq is used for high-throughput items (metrics and probe-events). it's kind of annoying that we have 2 messaging systems, but they have different characteristics and we exploit that. using rabbitmq for everything would be far from ideal, and ditto for NSQ probably ( @woodsaj is more familiar with the rabbitmq specifics)

grafana receives the data from the collectors (they run in collector-controller mode only) and sends it to NSQ. the collector-controllers also doesn't batch anymore, it exploits the fact that incoming data is naturally batched. in fact, batches are split up to make sure the messages stay under 10MB each.

PS: i will update the readme based on this

torkelo commented 8 years ago

@Dieterbe thanks for clarifying, make sense. Would be interesting to know if rabbitmq is required or we could use NSQ there as well.

Been thinking about Grafana and distributed setups (sharing cache state. for example for the alerting def index). What would be the best way for grafana nodes to talk to each other? NSQ, nanomsg. rabbit, raft..

Dieterbe commented 8 years ago

that sounds like something we should have a deeper conversation about, to learn context, requirements etc. happy to hangout about that some time. is there a ticket where we can resume this convo?

nopzor1200 commented 8 years ago

Just wanted to add in my 2c here, around a partcular angle (disclaimer: I don't know enough about the tradeoffs for nsq vs rabbit for various use cases)...

over the long term, it is definitely a downside from a distribution/packaging/supportability standpoint if our stack will require nsq (or whatever name we end up giving that component of the stack) and rabbit.

maybe not that big a deal for the saas offering, but if nsq could be a tight no-dependency part of our stack, it seems ideal to evaluate using it for * esp for the on-prem / downloadable use case.

Dieterbe commented 8 years ago

let's please have that conversation elsewhere (maybe in strategy repo or something)

woodsaj commented 8 years ago

@Dieterbe a few issues.

1) you are using metric.Name as the key, but this is not a globally unique identifier for a metric series as it does not contain any data about who owns the metric (Org). This can be easily addressed by changing https://github.com/raintank/raintank-metric/blob/tank/nsq_metrics_tank/handler.go#L36 to m := h.metrics.Get(metric.Id())

2) As with the above, the HTTP interface needs to either a) accept the metric ID as the query term, or b) keep a local index of metric.Id's and metric.name + org_id. Options A is obviously simpler and also preferable as Graphite-api gets the metric.Ids from Elastic anyway.

other then that, this is looking awesome.

Dieterbe commented 8 years ago

@woodsaj ok will look into that. do you have anything to say to what i asked about populating cassandra? (last point of https://github.com/raintank/raintank-metric/issues/21#issuecomment-142921544) thanks.

woodsaj commented 8 years ago

yes, happy to help out, and have already researched sufficiently to get started.

woodsaj commented 8 years ago

Basic cassandra support in https://github.com/raintank/raintank-metric/tree/tank_to_cassandra

The methods are there for sending data, but not sure where the data should be written from. What mechanism will flush out the aggregated metrics?

Dieterbe commented 8 years ago

now we should have flushing of the chunks to cassandra (it says it saves fine) + loading of the chunks from cassandra in the http interface to satisfy timestamps that fall outside of the in-memory range. for some reason it doesn't actually work yet but i think we're close :-p

btw, would be nice if we got the http json output to be the same format as graphite ("array" style [{"target": "error", "datapoints": [[null, 1443070801]], ...}]) but i haven't figured out yet how to do that

Dieterbe commented 8 years ago

summary

loading and saving seems to work, verified via json api in dev stack and working with a single endpoint

todo

Dieterbe commented 8 years ago

we can already easily spin up raintank-docker as current stack or tank based stack (remember to rebuild docker images!). but now i've been working on a script to collect perf metrics, do benchmarks with vegeta, verify correctness of data returned by graphite, as well as documenting the procedure so that the process is smooth and minimizes room for errors https://github.com/raintank/raintank-docker/wiki/performance-testing-a-timeseries-backend once some more things are fixed/implemented, it should be trivial to benchmark both stacks in a structured manner and compare them.

Dieterbe commented 8 years ago

to figure out before starting benches:

Dieterbe commented 8 years ago

summary

woodsaj commented 8 years ago

i want to test NMT/go-tsz in a realistic setting with fluctuating response times

You can try using kernels built in traffic control (tc) mechanism. I dont think you will be able to apply changes directly to individual containers. But it would work nicely inside a full VM, using virtualbox, kvm, or vmware.

http://www.linuxfoundation.org/collaborate/workgroups/networking/netem

woodsaj commented 8 years ago

I was able to get tc working with the docker containers. I added a commit to raintank-docker, so that network emulation rules are applied to the interfaces on all of the collector containers to increase their latency. https://github.com/raintank/raintank-docker/commit/145736ae75998db86e1a8343572c504b3d25d35b

The increased latency wont be applied if the endpoint is localhost. But will work for other addresses. So i would recommend that you change env-load to use the address of the docker0 interface instead of localhost.

Dieterbe commented 8 years ago

good stuff AJ. i just did https://github.com/raintank/raintank-docker/commit/1b6462560a9f6321b7ccfd6a1fc87f9b3780b228 but otherwise works great.

Dieterbe commented 8 years ago

the server now should have functioning rollups, i just have a hard time visually verifying because the metrics don't show up in the graphite-api output so i can't draw them in grafana.

Dieterbe commented 8 years ago

been making progress on serving consolidated responses in the http handler https://github.com/raintank/raintank-metric/pull/63 if anyone is curious to see the work in progress.

Dieterbe commented 8 years ago

with #63 gearing up it's time to start finding a great set of rollup intervals/settings. i know @woodsaj at one point had a google doc with a table of suggested rollup settings. can you share ( you shared "by the numbers v2" but that looks like smth else)

some thoughts:

woodsaj commented 8 years ago

also allow multiple input resolutions, good fit of rollups for each. (i think high resolution should also stay fairly high resolution throughout different bands, whereas if data comes in at low resolution, we can also be more aggressive in our rollups)

If rollup periods are not the same across all series, then there needs to be an index of the rollup periods that each series has available. At this stage we need to optimize for code simplicity.

Dieterbe commented 8 years ago

Factor in that...you quickly realize

that makes sense. so would you say each step in the rollup interval should be about 10x of the previous? For example if maxDataPoints is 800 and we set minDataPoints to 80 (it seems proper to allow points anywhere from 1 point per pixel to 1 point every 10 pixels), then if we had rollup intervals all about multiples of 10, then we would always be able to find a matching rollup interval that can be served without doing any further reduction at runtime. right now i'm playing with these settings in devstack: https://github.com/raintank/raintank-docker/commit/55e10422b912c81e7014f0c7dd54c5af284a9a41 needs some further thinking of course and instrumentation in the code of how proper our retention intervals correspond to requests, how much overhead there is, etc. and @woodsaj can you share the table of numbers of rollup intervals/retentions you had?

the main issue i have at this point with my settings now is that the raw stream can have a very broad range of resolutions. for litmus currently 10s to 120s but non litmus or litmus in the future i can see anywhere from 1s to 1h, this means the compression rate of the first rollup interval varies wildly and the first rollup level can be anywhere from basically useless to drastically insufficient, still requiring major runtime aggregation.

but like you say we can get the basic form working first and then worry about per-series/per-customer adjustments later once we have a better understanding of what we want.

woodsaj commented 8 years ago

@Dieterbe if the doc i shared with you doesnt have what you are looking for, then the information no longer exists.

Dieterbe commented 8 years ago

playing with rollups like a pro

1) update your dev stack 2) the nsq tools still need to be manually compiled, sorry. use consolidation-at-read-time branch for raintank-metric 3) disable alerting in grafana config, it creates too many queries that make looking at the log annoying 4) add --log-level 0 to the metricTank command in the screen file. 5) and replace the tail with tail -f /var/log/raintank/nsq_metrics_tank.log | grep -v 'pushing value to agg' in the screen file 6) launch stack 7) kill graphite watcher for same reason as alerting 8) add an endpoint as grafana, use all defaults 9) run ./delay_collector.sh, ./delay_collector.sh dev1 500 10 and similar commands to change the latency profile at specific points in time and give it some time to run at specific latency profiles. 10) open the new rollups tester dashboard so you can focus on one particular series (new since https://github.com/raintank/raintank-docker/commit/ab35c6e8e0c0eb4a96005aaeebd49aad0cec624e) 11) looking at the dash and the metricTank log you can now do experiments with display interval, and see what kind of data it loads, and you can verify whether things look allright or not. rollups-testing

Dieterbe commented 8 years ago

here's a video where i show it off: https://vimeo.com/147804095 i also merged this into master.

Dieterbe commented 8 years ago

this has been implemented for a while.