zio / zio-keeper

A ZIO library for building distributed systems
https://zio.dev/zio-keeper
Apache License 2.0
199 stars 41 forks source link

Adding metrics #53

Open pshemass opened 5 years ago

pshemass commented 5 years ago

one of most important part of distributed system is observability that's why we should build our library with this mindset.

Probably we should try to use https://github.com/zio/zio-metrics if it is not ready we should join them to help them release it.

@jdegoes @mijicd Let me know if you see other options.

mijicd commented 5 years ago

cc @toxicafunk

toxicafunk commented 5 years ago

I just migrated zio-metrics to the new org and successfully builded on circleci:

https://circleci.com/gh/zio/zio-metrics

It should be ready minus bugs that may be encountered during usage. I am currently writing an implementation that creates an ZEnv for Prometheus and for Dropwizard but the current version should be usable.

Will ask on the channel on how to publish it and I'm more than willing to work integrating it here.

pshemass commented 5 years ago

@toxicafunk do you need any help? Do you plan to add tracing ?

mijicd commented 5 years ago

@toxicafunk let us know if you need any help.

@pshemass I'm not sure about tracing. There are plenty of options for Open Tracing out there, and it's relatively simple to "bake" another, ZIO compatible client. However, Open Telemetry looks more interesting to me. I think it's worth jumping in that train. cc @jdegoes

However, at the moment I don't think we should broaden the scope too much. Let's come up with the set of metrics we want to expose, and see whether zio-metrics fits our needs right now.

pshemass commented 5 years ago

no, we should not blow the scope now, I'm just curious.

toxicafunk commented 5 years ago
  1. Thanks to @mijicd, zio-metrics is now published on Sonatype

  2. I made a small test. I have an implementation for Dropwizard as backend and another for Prometheus. The on for Dropwizard is more stable and appears to work as expected, I seem to have some bugs on the Prometheus backend.

  3. The idea for zio-metrics is to have a 1 common API and then implement it for different backends. Since the original API was mostly based on Dropwizard it makes sense this one is mostly ready, I could use some help to fix the Prometheus one, noting that this may imply changes in the original API. @pshemass I wouldn't mind some help in this aspect especially if you have experience with Prometheus.

  4. To avoid scope-creep we can ignore tracing for zio-keeper, but I will add it as an issue in zio-metrics so its something we could look at in parallel without affecting zio-keeper, @mijicd @pshemass wdyt?

  5. I will share the tests with you shortly, its an app that reads a file where each line is a json message, it extracts the ID and uses it as key to send a kafka message. Its just something I use at work and was the most convenient thing I had for testing this.

mijicd commented 5 years ago

@toxicafunk I agree with the point 4. Also, we should take a look at zio-metrics together, maybe we could have different modules (e.g. prometheus and dropwizard backends, open tracing etc.) and parallelize the work on them.

edit: Not sure whether it makes sense, just a wild guess :)

toxicafunk commented 5 years ago

So you can see my test for Prometheus here:

https://github.com/toxicafunk/zio-tests/blob/prometheus/src/main/scala/com/richweb/Main.scala

on the main function you'll see I define the backend impl:

val metrics = new PrometheusMetrics()

and since my own reporters are.... "funny".... I just use the Http Server included in Prometheus

val server = new HTTPServer(new InetSocketAddress(1234), metrics.registry);

which produces the following output given the counter cnt <- metrics.counter(Label("kafka_sent_messages", Array("zenv"))) and the timer I defined: tmr <- metrics.timer(Label("simple_timer", Array("test", "timer")))

# HELP kafka_sent_messages kafka_sent_messages counter
# TYPE kafka_sent_messages counter
kafka_sent_messages{zenv="zenv",} 100000.0
# HELP simple_timer simple_timer timer
# TYPE simple_timer summary
simple_timer_count{test="test",timer="timer",} 100000.0
simple_timer_sum{test="test",timer="timer",} 2456428.277020058

The count is correct since my file has 100k messages and we can average (sum/count) that processing took 24.56 ms/message.

For DropWizard (https://github.com/toxicafunk/zio-tests/blob/dropwizard/src/main/scala/com/richweb/Main.scala) I just define its backend:

val metrics = new DropwizardMetrics()

The easiest reporter to set is the ConsoleReporter:

val reporter = ConsoleReporter.forRegistry(metrics.registry)
    .convertRatesTo(TimeUnit.SECONDS)
    .convertDurationsTo(TimeUnit.MILLISECONDS)
    .build()

reporter.start(20, TimeUnit.SECONDS);

You may observe defining the count and timer (or its usage) doesn't change:

cnt <- metrics.counter(Label("kafka_sent_messages", Array("zenv")))
tmr <- metrics.timer(Label("simple_timer", Array("test", "timer")))

which produces the following output:

[info] Completed in 27355 ms
[info] 8/20/19 1:32:24 AM =============================================================
[info] -- Counters --------------------------------------------------------------------
[info] kafka_sent_messages.zenv
[info]              count = 100000
[info] -- Timers ----------------------------------------------------------------------
[info] simple_timer.test.timer
[info]              count = 100000
[info]          mean rate = 2502.84 calls/second
[info]      1-minute rate = 1857.60 calls/second
[info]      5-minute rate = 1293.06 calls/second
[info]     15-minute rate = 1167.15 calls/second
[info]                min = 1570.23 milliseconds
[info]                max = 27192.09 milliseconds
[info]               mean = 18209.35 milliseconds
[info]             stddev = 6573.32 milliseconds
[info]             median = 19658.25 milliseconds
[info]               75% <= 23925.28 milliseconds
[info]               95% <= 26495.53 milliseconds
[info]               98% <= 26861.20 milliseconds
[info]               99% <= 27068.98 milliseconds
[info]             99.9% <= 27192.09 milliseconds

So the library is usable but does needs some care and love :smile:

Any comments?

toxicafunk commented 5 years ago

@pshemass https://github.com/zio/zio-metrics/issues/8

pshemass commented 5 years ago

I have opportunity to test at work because we have Kamon setup with Prometheus.

API is usable but it needs some love :) Is there zio-metrics gitter channel that could discuss this?

toxicafunk commented 5 years ago

There is now: https://gitter.im/ZIO/zio-metrics

toxicafunk commented 5 years ago

Just wanted to add that there is also a histogramTimer in prometheus that is easier to use than the regular timer:

tmr <- metrics.histogramTimer(Label("simple_timer", Array.empty[String]))
...
  .mapMParUnordered(120)(
    l => messenger.send(prd, idL.getOption(toJson(l)).getOrElse("UND"), l)
  )
  .tap(md => cnt(1) *> tmr() *> putStrLn(md.toString()))