prometheus / client_ruby

Prometheus instrumentation library for Ruby applications
Apache License 2.0
514 stars 149 forks source link

Support pre-fork servers #9

Closed atombender closed 5 years ago

atombender commented 9 years ago

If you use this gem with a multi-process Rack server such as Unicorn, surely each worker will be returning just a percentage of the correct results (eg., number of requests served, total time), thus making the exposed metrics fairly meaningless?

To solve this the easiest solution is to create a block of shared memory in the master process that all workers share, instead of using instance variables.

brian-brazil commented 9 years ago

I believe this problem came up with some users of the (experimental) Python client, and some proposed work on the HAProxy exporter.

What you really want to do is separately scrape each worker, and then aggregate it up in the Prometheus server. This gives the most useful/correct results, and also gives you per-process statistics for things like memory and cpu. I'm not familiar with Unicorn or Rack, do they offer anything that'd allow for this?

Shared memory is an option, it has disadvantages though in that it'd be a source of contention and you can't do per-process analysis.

juliusv commented 9 years ago

From the Unicorn Design doc:

"Unicorn is a traditional UNIX prefork web server. [...] Load balancing between worker processes is done by the OS kernel. All workers share a common set of listener sockets and does non-blocking accept() on them. The kernel will decide which worker process to give a socket to and workers will sleep if there is nothing to accept()."

Given that, I'm not sure how doable it is to scrape individual workers. Even if it was possible, the benefit of having individual insight and avoiding contention has to be weighed against potentially having an order of magnitude more time series.

I'd expect that on average, the workers on a single machine will behave similarly, except admittedly in pathological situations like this one pointed out in the Unicorn docs: "When your application goes awry, a BOFH can just "kill -9" the runaway worker process without worrying about tearing all clients down, just one".

It also states that "Unicorn deployments are typically only 2-4 processes per-core", which would mean multiplying the number of exposed time series on a typical large machine with e.g. 32 cores by 64x to 128x.

@grobie Did you have any plans wrt. supporting preforked servers already?

atombender commented 9 years ago

@brian-brazil: It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.

Scraping individual workers, as @juliusv points out, would mostly be useful to the extent that it would capture misbehaving workers that consume too much memory or CPU.

Having run Unicorn in production for years, I would agree that this doesn't happen often enough to justify the overhead of monitoring individual worker. Unicorn's master process will "kill -9" processes that don't respond within a per-process request timeout. In my experience, this tends to cover the pathological cases.

In terms of scaling, One of our clusters has 300 Unicorn workers on each box (anything from 4-25 per app). I wouldn't want to have Prometheus poll that many workers every few seconds.

juliusv commented 9 years ago

:+1: Sounds like we need a shared-memory solution or similar.

brian-brazil commented 9 years ago

It's possible to collect per-process information with shared memory: Just organize the memory by PID. You'll have to reap dead PIDs, though. If you're clever you can avoid locking altogether, by having each process write to its own dedicated segment of the shared memory.

You need to be a little careful here, as you can't simply throw away data from old PIDs. You'd need to combine in the counters to ensure they stay monotonically increasing. Organizing on >128 byte boundaries (size of a cache line) should avoid locking issues during collection.

How would we deal with gauges? Both the set and inc/dec styles are problematic. The set style doesn't make sense when there's multiple processes (you really need it per-process), and with the inc/dec pattern to count in progress requests you probably want to reset to 0 for a pid when that pid dies.

I wouldn't want to have Prometheus poll that many workers every few seconds.

60s is a more than sufficient monitoring interval for most purposes. Benchmarking will determine the actual impact.

This seems like it'll get very complicated to implement and use, as most uses of instrumentation would need to take the process model into account to get useful and correct results out the other end. How do other instrumentation systems solve this problem?

atombender commented 9 years ago

You need to be a little careful here, as you can't simply throw away data from old PIDs.

How would this differ from the hypothetical non-shared case where each worker exposes a metrics port? If the worker dies and starts again, all its counters will restart at zero. In a hypothetical shared-memory implementation where each worker has a dedicated segment that is reported separately, it would be the same thing, no?

Anyway, the problem is moot because I just realized that Unicorn uniquely identifies workers by a sequential number, which is stable. So if worker 3 dies, the next worker spawned is also given the number 3. That means no reaping is needed, nor is it necessary to any kind of PID mapping, which I was worried about.

I did a little digging into the Unicorn sources and just realized that Unicorn uses a gem called raindrops to accomplish exactly what I have been describing so far. Raindrops uses shared memory to collect and aggregate worker metrics (Unicorn also seems to use Raindrops directly to detect idle workers), but it looks like it only cares about socket metrics, not anything related to HTTP. Worst case, if it's cannot be used as-is, I could look into extracting the relevant code from Raindrops into this gem.

brian-brazil commented 9 years ago

How would this differ from the hypothetical non-shared case where each worker exposes a metrics port?

Where the data comes from each worker, there's no persistence of data beyond a worker's lifetime and the rate() function works as expected. With shared memory where the client is smart and doing aggregation, you need to not lose increments from dead workers as counters need to be kept monotonic.

Prometheus has the paradigm that instrumentation is dumb, and all the logic is done in the prometheus server. You need to be careful when adding logic to clients that it maintains the same semantics as a dumb client.

So if worker 3 dies, the next worker spawned is also given the number 3.

That makes things easier alright. And as I presume everything is single-threaded, we don't need locking for metric updates.

Thinking on this over the past bit, I think we can make this work but only for Counter and Counter-like metrics (Summary without percentile and Histogram).

juliusv commented 9 years ago

@brian-brazil Gauges are indeed a problem. I see four possibilities in principle for them:

  1. Allow the user to configure a merging strategy (averaging, summing, ...) for each gauge. The actual merging work wouldn't need to happen continuously, but only upon scrape.
  2. Make an exception for gauges and distinguish them by a "worker" label.
  3. Use "last write wins". But that's not useful at all for any common gauge use cases (queue lengths, etc.).
  4. Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.

1 is probably what we'd want if we don't split metrics by worker otherwise.

@atombender If indeed one Unicorn server is always expected to have the same number of stably identified worker processes (at least between complete Unicorn restarts, which would also wipe the shared memory), your approach sounds sensible to me. If I understand the raindrops approach correctly, it uses a single counter for all workers ("counters are shared across all forked children and lock-free"), so there wouldn't even be any counter resets anywhere in that approach if only a single worker is killed?

juliusv commented 9 years ago

(if you're only reading by mail - I edited my last comment because Markdown made a 5 out of my 1...)

atombender commented 9 years ago

Note that I am not suggesting that the client should aggregate anything. I am suggesting that each worker has a slot in the shared memory segment that it is responsible for updating. When a worker gets a /metrics request, it simply returns the contents of all slots, separately labeled by worker ID.

Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.

(Pardon my obtuseness, but I'm not too familiar with how Prometheus works yet.)

brian-brazil commented 9 years ago

Don't support gauges in this kind of environment, expecting that the bulk of relevant metrics for this use case would be counters and summaries anyways.

This is my expectation. It sounds like entirely an online-serving use case, with no thread pools or the like where you'd usually want gauges. I can see some use cases for gauges (e.g. if you were batching up requests to send on elsewhere), but the counters uses are the primary ones.

juliusv commented 9 years ago

@atombender Ah, ok. That'd definitely work and require fewer scrapes, though you'll still end up with a lot of dimensionality in your time series that might not prove to be of much value vs. their cost in transfer, storage, and queries.

@brian-brazil Yep.

brian-brazil commented 9 years ago

When a worker gets a /metrics request, it simply returns the contents of all slots, separately labeled by worker ID.

That'd work, I thought we were trying to avoid that due to all the additional timeseries.

Shouldn't that completely work around the gauge problem? I was initially thinking that the client would aggregate workers, because, as I said earlier, I think per-worker metrics aren't very useful. But if it poses a problem for gauges, I think it's better to "over-report" and do aggregation on the server.

Reporting everything is what I generally favour, assuming it works out resource wise.

That resolves the counter problem, it doesn't solve the gauge problem as gauge use cases tend to work under the assumption that the gauge is reset when the process dies/starts (think of a gauge that represented in-progress requests). If unicorn can clear them when a process dies, that should handle that.

brian-brazil commented 9 years ago

The other limit, is that callback-like collectors won't work with a shared memory model. This would affect things like runtime and process statistics, that you want to gather at scrape time.

grobie commented 9 years ago

@grobie Did you have any plans wrt. supporting preforked servers already?

When we had the use cases at SoundCloud, we discussed three options: a) scraping each worker independently b) letting the master process handle metrics (the current dumb lock implementation in the ruby client probably needs some improvement) c) let each worker push metrics to a pushgateway.

I believe we run all our Ruby services on JRuby by now using the java client. Going forward with b) sounds good to me.

jeffutter commented 8 years ago

I am very interested in this issue. I am wondering if it is seen with threaded servers (like Puma)? I think they work in a way that memory isn't shared? Depending on where/when things get initialized?

Is there there is a demo app in the repo I'll play around with it today.

Is anyone aware of any other rack middlewares that share global state that we could look to for solutions?

jeffutter commented 8 years ago

If raindrops is the right path to look at for this... They have a sample middleware. Their code isn't on github but here is a clone of that middleware: https://github.com/tmm1/raindrops/blob/master/lib/raindrops/middleware.rb

grobie commented 8 years ago

Thanks, I'll have a look.

jeffutter commented 8 years ago

I have been playing around with raindrops, but can't seem to get the processes writing to the same counters. I'll keep at it. In the meantime I found this: https://github.com/TheClimateCorporation/unicorn-metrics which might show a similar use case.

jeffutter commented 8 years ago

For those interested in this I have made a little progress. It seems that when using raindrops the Raindrop counters need to be declared before the workers fork. Otherwise each worker will end up instantiating separate instances of the counters. If you declare them before hand, it actually works as expected.

This makes the prometheus middleware a little inconvenient, as you would need to know all possible endpoints and other counters beforehand. This is the approach that the unicorn-metrics project above takes.

I have began looking at a different approach - where each worker exposes it's own metrics at /_metrics and then there would be a middleware at /metrics that queries the others and combines the results. I currently can't find a way to figure out what the other workers TCP/unix sockets are from inside the worker. There is a Unicorn::HttpServer::LISTENERS constant, but it seems to only contain reference to the current worker.

Hopefully this information can be helpful to someone, I would really love to see prometheus being useful on ruby servers.

brian-brazil commented 8 years ago

The approach I'm taking over in python is to have each process keep a store of it's metrics on a file per process on disk, and then read them in at scrape time.

I can see a challenge in distributing the scrape to workers if a worker is single-threaded. You want scapes to be independent of other processing.

jeffutter commented 8 years ago

I implemented backing the metrics with PStores. This seems to have the desired effect albeit causing a slowdown since it writes stats to disk on every request. My branch is here: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I tested it out with apache bench doing 2000 requests with 10 concurrent requests and 4 unicorn workers. After running that the /metrics endpoint reported the correct number of hits, not sure about the other types of stats.

On this branch, with the above setup, I'm getting ~ 350 requests /sec. While on the master branch. I'm getting over 1,700. Now, while this sounds like a big difference, the duration of the request that returns OK/200 is about 13ms.. I'm guessing in a real-world load, with real request times, this difference will be much more negligible.

That being said, there may be better backing stores to use than PStore or better ways to store it. Also, it should probably be an option to back with a hash when not running in a prefork environment.

Let me know what you think.

brian-brazil commented 8 years ago

That's pretty much the same approach as I'm using in Python, however I'm only letting this behaviour kick in when explicitly asked for as it has a number of limitations including performance (see https://github.com/prometheus/client_python/pull/66).

Scrape-time performance doesn't matter too much. what's really important is that metric increments etc. be fast. Optimally that'd mean something where writes hit the kernel, but not the disk.

jeffutter commented 8 years ago

On my branch: https://github.com/jeffutter/prometheus-client_ruby/tree/multi-worker I have updated it so there are backing "stores". I have included one with the original behavior (Hash) and one with the new behavior (PStore). If you pass the prefork: true argument to the middlewares it will use the PStore backend, otherwise it uses the original. I imagine this could be extended in the future to support other on-disk or even DB (redis?) backends.

Also (I'm not sure why) but the benchmarks have improved a little. The Hash one is giving me 2,100 requests/sec while the PStore one is up to 1,400. Perhaps may just be a system resource difference from before, I need some more scientific benchmark. However at 1,400 requests/sec I doubt that will cause a bottleneck in any real web load.

Currently there are a bunch of broken tests. If this solution seems like something that might be considered for merging I will fixup the tests, update the docs and clean everything up.

jeffutter commented 8 years ago

For continuity sake. Discussion for this topic has continued over on the mailing list here: https://groups.google.com/d/topic/prometheus-developers/oI3qPhaRY7I/discussion

mortenlj commented 8 years ago

Hi!

It seems discussion about this issue died out on the mailing list sometime before Christmas last year. Is there any movement towards a solution to this problem?

grobie commented 8 years ago

I'm not aware of anyone working on a solution for this problem at the moment. Ideas and contributions are very welcome.

zevarito commented 7 years ago

Does anyone found a way to make it work with Puma?

zevarito commented 7 years ago

Well, it seems to work with @jeffutter branch.

Here is how I did it, please take a look and tell me what needs to be improved.

in config/puma.rb

workers N

on_worker_boot do |index|
  $puma_worker = index
end

in config.ru

$registry = Prometheus::Client.registry(Prometheus::Client::Stores::PStore)
$registry.counter(:api_enqueued_jobs_total, 'A counter of API enqueued jobs')
....

use Prometheus::Client::Rack::Exporter, prefork: true, registry: $registry

in SomeController.rb

$registry.get(:api_enqueued_jobs_total).increment(..., worker: $puma_worker)

@jeffutter why it is returning JSON by default on '/metrics' endpoint?

zevarito commented 7 years ago

Here is a quick Benchmark I've done in my laptop:

ApacheBench/2.3 -n 1000 -c 10

Puma workers 4 threads 16,32

Request POST to API endpoint that hit database and return JSON.

No Prometheus at all.

Requests per second:    27.46 [#/sec] (mean)
Time per request:       364.217 [ms] (mean)
Time per request:       36.422 [ms] (mean, across all concurrent requests)
Transfer rate:          44.86 [Kbytes/sec] received
                        5.55 kb/s sent
                        50.41 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      11
Processing:   113  362 217.7    311    1582
Waiting:      113  360 217.5    309    1580
Total:        113  362 217.8    311    1582

Percentage of the requests served within a certain time (ms)
  50%    311
  66%    411
  75%    469
  80%    522
  90%    638
  95%    724
  98%    910
  99%   1231
 100%   1582 (longest request)

Taking one measure

Requests per second:    27.11 [#/sec] (mean)
Time per request:       368.899 [ms] (mean)
Time per request:       36.890 [ms] (mean, across all concurrent requests)
Transfer rate:          44.29 [Kbytes/sec] received
                        5.48 kb/s sent
                        49.77 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       7
Processing:   110  366 233.1    282    1318
Waiting:      109  364 232.6    281    1318
Total:        110  366 233.2    283    1318

Percentage of the requests served within a certain time (ms)
  50%    283
  66%    404
  75%    507
  80%    545
  90%    691
  95%    848
  98%   1003
  99%   1111
 100%   1318 (longest request)

Request count by worker

1 => 253 3 => 251 2 => 245 0 => 251

Taking one measure and hitting /metrics path once every 5s

Requests per second:    27.76 [#/sec] (mean)
Time per request:       360.260 [ms] (mean)
Time per request:       36.026 [ms] (mean, across all concurrent requests)
Transfer rate:          45.35 [Kbytes/sec] received
                        5.61 kb/s sent
                        50.96 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       5
Processing:   113  357 218.7    281    1292
Waiting:      113  356 218.3    281    1292
Total:        113  358 218.7    281    1292

Percentage of the requests served within a certain time (ms)
  50%    281
  66%    403
  75%    486
  80%    540
  90%    688
  95%    792
  98%    913
  99%   1001
 100%   1292 (longest request)

Request count by worker

2 => 252 1 => 251 3 => 248 0 => 249

grobie commented 7 years ago

The JSON format has been fully removed in Prometheus server v1.0.0 and in client_ruby v0.5.0. We should only use the text format.

I'm happy to review pull requests, but as SoundCloud doesn't use Ruby much any more, I won't have time to work on something on my own before December.

jeffutter commented 7 years ago

@zevarito Yeah, sorry my branch uses JSON, it is over a year old, I think back then JSON was supported.

As much as I dig Prometheus, work decided they didn't want to run their own monitoring servers, so we've gone with NewRelic. I would love to see Prometheus' ruby client support multi-threaded and forking servers but I'm afraid I don't have any time to contribute anymore at the moment.

Feel free to take the branches I had and build off of them or go in an entirely different direction.

brian-brazil commented 7 years ago

The python client now has multi-process support as of https://github.com/prometheus/client_python/pull/66

A similar method should work for Ruby.

zevarito commented 7 years ago

@grobie I've fetched latest code and merged in @jeffutter branch and it seems to work fine. Besides PStore addition, the code also provides a separation in the Store layer that benefits the addition of new storage mechanisms in the future.

An important thing to notice is that PStore will not work with nested transactions, that could be a problem in some cases, but another storage mechanism can be added to address that.

I will give it a try in staging environment and see how it works. I'll do a PR if all goes smooth.

@brian-brazil do you think -the python way- should be the way to do it for the Ruby lib or can be it just another storage mechanism?

brian-brazil commented 7 years ago

The Python way should be the way to do it in Ruby, as I expect you'll run into all the same issues I did with other approaches in Python with Ruby too.

36ms is not at all workable for instrumentation. The goal should be sub-microsecond.

zevarito commented 7 years ago

@brian-brazil I am lost about "36ms is not at all workable for instrumentation" from where it comes?

brian-brazil commented 7 years ago

The above benchmarks indicate 36ms for an operation.

zevarito commented 7 years ago

@brian-brazil The Benchmarks are based in an operation that involves POST, DB access and JSON serialization. Unless I am getting it wrong, Benchmark shows 1 not using Prometheus with a .468ms faster response than using Prometheus.

Bn 1) Not Prometheus - Time per request: 36.422ms Bn 2) Use Prometheus - Time per request: 36.890ms

You could point out that Bn 3 using Prometheus and having hypothetically more load by being hit with /metrics it actually shows even better performance than the rest of the Bn, but the offset is in my opinion short of acceptable since it is less than 1 ms between the three samples. Do you think it is not? Indeed more test should be done.

brian-brazil commented 7 years ago

.468ms is still way to slow. Instrumentation should cost less than a microsecond.

zevarito commented 7 years ago

@brian-brazil there are any benchmarking I can see of the Python lib or any other lib? "Instrumentation should cost less than a microsecond." I guess ab is not the right tool to measure that, should it be measured from code itself or some profiling?

brian-brazil commented 7 years ago

The Python multi-proc solution is at about 1.2us. This is due to Python's mutexes being surprisingly slow (should be 1-2 orders of magnitude faster).

I usually use a quick microbenchmark of .inc().

grobie commented 7 years ago

As already indicated in #11, the implementation might benefit from

.468ms is still way to slow. Instrumentation should cost less than a microsecond.

This is pretty much an impossible goal for ruby. A hash access and counter increase alone already takes close to one microsecond. This is without any Mutex involved.

So far we've also traded labels validation for execution speed. This should probably be made opt-in so that one can run with a strict Registry in tests, but use faster registry in production.

brian-brazil commented 7 years ago

A hash access and counter increase alone already takes close to one microsecond.

This should be measured in nanoseconds. I know Ruby isn't the fastest, but I have difficulty believing it's that slow.

zevarito commented 7 years ago

@grobie https://github.com/prometheus/client_ruby/issues/11 actually is about scraping time, I think what @brian-brazil is concern is about the measuring time. Worth to notice that I am not using Rack::Collector here, just measuring other non-http activity. Also the measure I am taking is just counter increment.

Here is the benchmarking at method level call.

I've used AB as the previous tests and wrapped measure call around Benchmark.bm block and then log.

These are the first 5 and the tail 5 results for 1000 requests.

[#<Benchmark::Tms:0x007f8c03124378 @label="", @real=0.00038335899444064125, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c021ee430 @label="", @real=0.0004944999964209273, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c03177820 @label="", @real=0.0005114129962748848, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0220d1f0 @label="", @real=0.0005079250040580519, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c035e5678 @label="", @real=0.00026892300229519606, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
...
[#<Benchmark::Tms:0x007f8c0663a7d8 @label="", @real=0.0002955899981316179, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0665a128 @label="", @real=0.00022226799774216488, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c0686f558 @label="", @real=0.000291318996460177, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c065586d0 @label="", @real=0.00021019799896748737, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
[#<Benchmark::Tms:0x007f8c04e8e4e0 @label="", @real=0.00018655300664249808, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>]
grobie commented 7 years ago

@zevarito The scrape example mentioned in #11 was just one example, the issue was about investigating and removing any kind of performance bottleneck.

@brian-brazil I guess there is a reason that writing C-extensions is so popular in Ruby.

require 'benchmark'
h = {}
Benchmark.realtime { 1000.times { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } / 1000
# => 5.882129771634937e-07

m = Mutex.new
Benchmark.realtime { 1000.times { m.synchronize { c = h[:foo_total] || 0; c += 1; h[:foo_total] = c } } } / 1000
# => 1.4377119950950147e-06
zevarito commented 7 years ago

@grobie right, and I remember the conversation we have here https://github.com/prometheus/client_ruby/issues/33#issuecomment-250757289 about Rack::Collector and scrapping time, worth to mention that the cardinality is not a Ruby issue itself.

I wonder if you find acceptable the last Benchmark I've posted.

SamSaffron commented 7 years ago

note... unicorn provides cross process metrics using raindrops ... see:

https://bogomips.org/raindrops/

This uses shared memory and is already built, can be extended to add more metrics and provide more stats as needed.

I am looking at writing an exporter based off raindrops for discourse

christoph-buente commented 7 years ago

Hi, we are experiencing the mentioned behaviour with ruby+rails+unicorn in an environment with more than one worker. As you can see on the screenshots, the workers load does not seem to be distributed evenly. The upper and lower boundary is basically the scraped values from either worker 1 or 2. The drop in the graph is the time of deployment.

screenshot 2017-01-04 09 50 02

I'm actually wondering if someone could check out other libraries that are measuring rack/rails stats as a middleware? For example https://github.com/librato/librato-rack or https://github.com/newrelic/rpm

We are using both and i did not experience these kind of problems there?

Thanks

jeffutter commented 7 years ago

@christoph-buente I poked around a bit in the newrelic gem. I believe neither of these aggregate metrics between processes. Both of them push metrics to a remote server which handles the aggregation. I believe they probably buffer the events, so that it doesn't send one event on every single request, but it still gets around the problem we face here by aggregating the events on the server.

christoph-buente commented 7 years ago

Thanks @jeffutter for looking into it. Even though it's highly discouraged for long running processes, but i feel the push gateway seems to be a legit way to get around this too.