Closed Dieterbe closed 8 years ago
the way i see it, before coming up with proper rollup settings, it's a good practice to first list the desires and be aware of each of them, because this is where we may have different opinions: (and i think maybe i forgot a few considerations that @torkelo or @woodsaj may think of)
The two key drivers for rollups are:
right now, item 1 is of higher priority then item2.
In addition to the requirements listed above, i would add
I am not sure if an additional NSQ consumer is the right approach.
In order to perform rollups, we are going to need to buffer metrics in memory. Querying this inMemory data is going to be faster then querying the TSDB, and so we should have graphite leverage it. In which case we can delay when we write to C*
So, my view is that we read from NSQ, then just write to the inMemory store. We then have another process or thread that uses this inMemory store to:
Querying this inMemory data is going to be faster then querying the TSDB, and so we should have graphite leverage it
how sure are we that there is a significant difference in response time for serving read requests for recent data? or maybe this question is not that relevant because if we're going to batch up writes (and it looks like we should) then we need this in-memory component to satisfy hot data anyway.
i agree with your appoach, but the way i envision this, the nsq reading, inMemory store, and C* flushing can all happen in the same process, until we reach scalability limits.
currently prototyping this
(https://github.com/raintank/raintank-metric/tree/tank/nsq_metrics_tank has some details)
next up:
chunkSpan*(numChunks-1)
) would be nice. @woodsaj can you give this a shot?nice progress! is nsq_metrics_tank a separate process or is part of raintank-metric process that feeds on NSQ.
Just curios of the topology / processes involved the the metric stack. has NSQ completely replaced RabbitMq? What process receives metrics from collectors and batches them onto NSQ, and then there is another process that receives them and saves meta data to elastic and metrics to cassandra?
every nsq_* app in this repo runs as a service. there is no more raintank-metric process. nothing in here feeds to nsq, they all consume from nsq. there is one for maintaining the metric definitions in ES, one for storing probe events in ES, one for saving metrics to kairos. and now this new one, which should eventually also replace the latter.
we still use rabbitmq for the grafana app bus and perhaps a few other things, not sure. but nsq is used for high-throughput items (metrics and probe-events). it's kind of annoying that we have 2 messaging systems, but they have different characteristics and we exploit that. using rabbitmq for everything would be far from ideal, and ditto for NSQ probably ( @woodsaj is more familiar with the rabbitmq specifics)
grafana receives the data from the collectors (they run in collector-controller mode only) and sends it to NSQ. the collector-controllers also doesn't batch anymore, it exploits the fact that incoming data is naturally batched. in fact, batches are split up to make sure the messages stay under 10MB each.
PS: i will update the readme based on this
@Dieterbe thanks for clarifying, make sense. Would be interesting to know if rabbitmq is required or we could use NSQ there as well.
Been thinking about Grafana and distributed setups (sharing cache state. for example for the alerting def index). What would be the best way for grafana nodes to talk to each other? NSQ, nanomsg. rabbit, raft..
that sounds like something we should have a deeper conversation about, to learn context, requirements etc. happy to hangout about that some time. is there a ticket where we can resume this convo?
Just wanted to add in my 2c here, around a partcular angle (disclaimer: I don't know enough about the tradeoffs for nsq vs rabbit for various use cases)...
over the long term, it is definitely a downside from a distribution/packaging/supportability standpoint if our stack will require nsq (or whatever name we end up giving that component of the stack) and rabbit.
maybe not that big a deal for the saas offering, but if nsq could be a tight no-dependency part of our stack, it seems ideal to evaluate using it for * esp for the on-prem / downloadable use case.
let's please have that conversation elsewhere (maybe in strategy repo or something)
@Dieterbe a few issues.
1) you are using metric.Name as the key, but this is not a globally unique identifier for a metric series as it does not contain any data about who owns the metric (Org). This can be easily addressed by changing https://github.com/raintank/raintank-metric/blob/tank/nsq_metrics_tank/handler.go#L36 to
m := h.metrics.Get(metric.Id())
2) As with the above, the HTTP interface needs to either a) accept the metric ID as the query term, or b) keep a local index of metric.Id's and metric.name + org_id. Options A is obviously simpler and also preferable as Graphite-api gets the metric.Ids from Elastic anyway.
other then that, this is looking awesome.
@woodsaj ok will look into that. do you have anything to say to what i asked about populating cassandra? (last point of https://github.com/raintank/raintank-metric/issues/21#issuecomment-142921544) thanks.
yes, happy to help out, and have already researched sufficiently to get started.
Basic cassandra support in https://github.com/raintank/raintank-metric/tree/tank_to_cassandra
The methods are there for sending data, but not sure where the data should be written from. What mechanism will flush out the aggregated metrics?
now we should have flushing of the chunks to cassandra (it says it saves fine) + loading of the chunks from cassandra in the http interface to satisfy timestamps that fall outside of the in-memory range. for some reason it doesn't actually work yet but i think we're close :-p
btw, would be nice if we got the http json output to be the same format as graphite ("array" style [{"target": "error", "datapoints": [[null, 1443070801]], ...}]
) but i haven't figured out yet how to do that
loading and saving seems to work, verified via json api in dev stack and working with a single endpoint
we can already easily spin up raintank-docker as current stack or tank based stack (remember to rebuild docker images!). but now i've been working on a script to collect perf metrics, do benchmarks with vegeta, verify correctness of data returned by graphite, as well as documenting the procedure so that the process is smooth and minimizes room for errors https://github.com/raintank/raintank-docker/wiki/performance-testing-a-timeseries-backend once some more things are fixed/implemented, it should be trivial to benchmark both stacks in a structured manner and compare them.
to figure out before starting benches:
tail -f /var/log/raintank/grafana-dev.log | grep 'job results' | linecounter -freq=10000
select * from stats.timers.nsq_metrics_tank.nmt.requests_span.mem.count_ps;
empty response
(though sudo ngrep -d any -W byline host 172.17.0.45 and port 6063
looks pretty good)i want to test NMT/go-tsz in a realistic setting with fluctuating response times
You can try using kernels built in traffic control (tc) mechanism. I dont think you will be able to apply changes directly to individual containers. But it would work nicely inside a full VM, using virtualbox, kvm, or vmware.
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
I was able to get tc working with the docker containers. I added a commit to raintank-docker, so that network emulation rules are applied to the interfaces on all of the collector containers to increase their latency. https://github.com/raintank/raintank-docker/commit/145736ae75998db86e1a8343572c504b3d25d35b
The increased latency wont be applied if the endpoint is localhost. But will work for other addresses. So i would recommend that you change env-load to use the address of the docker0 interface instead of localhost.
good stuff AJ. i just did https://github.com/raintank/raintank-docker/commit/1b6462560a9f6321b7ccfd6a1fc87f9b3780b228 but otherwise works great.
the server now should have functioning rollups, i just have a hard time visually verifying because the metrics don't show up in the graphite-api output so i can't draw them in grafana.
been making progress on serving consolidated responses in the http handler https://github.com/raintank/raintank-metric/pull/63 if anyone is curious to see the work in progress.
with #63 gearing up it's time to start finding a great set of rollup intervals/settings. i know @woodsaj at one point had a google doc with a table of suggested rollup settings. can you share ( you shared "by the numbers v2" but that looks like smth else)
some thoughts:
smartSummarize()
and the like as well. @woodsaj seems more of a fan of max 6h buckets, which I'm fine with until we have better insights. using 6h buckets we should be able to easily compute per-day and per-7 days stats as well at runtime. (42 points, 192 times to be precise for the above scenario)also allow multiple input resolutions, good fit of rollups for each. (i think high resolution should also stay fairly high resolution throughout different bands, whereas if data comes in at low resolution, we can also be more aggressive in our rollups)
If rollup periods are not the same across all series, then there needs to be an index of the rollup periods that each series has available. At this stage we need to optimize for code simplicity.
Factor in that...you quickly realize
that makes sense. so would you say each step in the rollup interval should be about 10x of the previous? For example if maxDataPoints is 800 and we set minDataPoints to 80 (it seems proper to allow points anywhere from 1 point per pixel to 1 point every 10 pixels), then if we had rollup intervals all about multiples of 10, then we would always be able to find a matching rollup interval that can be served without doing any further reduction at runtime. right now i'm playing with these settings in devstack: https://github.com/raintank/raintank-docker/commit/55e10422b912c81e7014f0c7dd54c5af284a9a41 needs some further thinking of course and instrumentation in the code of how proper our retention intervals correspond to requests, how much overhead there is, etc. and @woodsaj can you share the table of numbers of rollup intervals/retentions you had?
the main issue i have at this point with my settings now is that the raw stream can have a very broad range of resolutions. for litmus currently 10s to 120s but non litmus or litmus in the future i can see anywhere from 1s to 1h, this means the compression rate of the first rollup interval varies wildly and the first rollup level can be anywhere from basically useless to drastically insufficient, still requiring major runtime aggregation.
but like you say we can get the basic form working first and then worry about per-series/per-customer adjustments later once we have a better understanding of what we want.
@Dieterbe if the doc i shared with you doesnt have what you are looking for, then the information no longer exists.
1) update your dev stack
2) the nsq tools still need to be manually compiled, sorry. use consolidation-at-read-time branch for raintank-metric
3) disable alerting in grafana config, it creates too many queries that make looking at the log annoying
4) add --log-level 0
to the metricTank command in the screen file.
5) and replace the tail with tail -f /var/log/raintank/nsq_metrics_tank.log | grep -v 'pushing value to agg'
in the screen file
6) launch stack
7) kill graphite watcher for same reason as alerting
8) add an endpoint as grafana
, use all defaults
9) run ./delay_collector.sh
, ./delay_collector.sh dev1 500 10
and similar commands to change the latency profile at specific points in time and give it some time to run at specific latency profiles.
10) open the new rollups tester dashboard so you can focus on one particular series (new since https://github.com/raintank/raintank-docker/commit/ab35c6e8e0c0eb4a96005aaeebd49aad0cec624e)
11) looking at the dash and the metricTank log you can now do experiments with display interval, and see what kind of data it loads, and you can verify whether things look allright or not.
here's a video where i show it off: https://vimeo.com/147804095 i also merged this into master.
this has been implemented for a while.
the time has come.