yabeda-rb / yabeda

Extendable framework for collecting and exporting metrics from your Ruby application
MIT License
775 stars 25 forks source link

Support for dynamic labels/tags (set in runtime) #12

Closed raivil closed 4 years ago

raivil commented 4 years ago

Hi,

Them gem doesn't seem to support custom labels/tags that are not fixed value default_tag. Are there plans to add this kind of functionality?

Thank you.

Envek commented 4 years ago

Hi! Can you please tell more about your use case? What do you want to do? How it should look?

raivil commented 4 years ago

Hey @Envek,

My idea with adding custom tags would be to support metrics segmented by Customers or any other attribute. Example:

sidekiq_job_latency_seconds_sum{queue="default",worker="Jobs::SomeJobClass", customer_id="ABC"} 33.04740643501282
sidekiq_job_latency_seconds_count{queue="default",worker="Jobs::SomeJobClass", customer_id="ABC"} 65.0

For more context, on Newrelic or ELK, it's possible to add custom labels or context data to transactions, that would work similar to adding tags/labels to Prometheus metrics. Examples: ELK and Newrelic

I'm not sure if this is the best way to accomplish that, and I'd appreciate your input.

Envek commented 4 years ago

@raivil What do you think if it will look like this?

Yabeda.with_tags(additional: labels) do
  # your code here
end

It have to be in block form as in general it is hard to say when current transaction is over and custom tags should be cleared.

But you can use it in Rails controller's around_action or wrap your jobs with custom sidekiq server middleware.

raivil commented 4 years ago

@Envek that could work.

For yabeda-rb/yabeda-sidekiq, maybe something, as shown above, is also a possible solution (but not so flexible to get dynamic values): This code is from the Fastly gem examples (https://github.com/fastly/sidekiq-prometheus)

class SomeWorker
  include Sidekiq::Worker

  def prometheus_labels
    { some: 'label' }
  end
end

A method like that could "fit" with the code from the existing middleware (https://github.com/yabeda-rb/yabeda-sidekiq/blob/master/lib/yabeda/sidekiq/client_middleware.rb)

Yabeda::Sidekiq.labelize(worker, job, queue, prometheus_labels) # added an optional param here

Something similar could be done on https://github.com/yabeda-rb/yabeda-rails/blob/master/lib/yabeda/rails.rb.

Not sure if those are the best solutions or places to do the changes, I just wanted to highlight the idea.

Envek commented 4 years ago

I've released yabeda 0.6.0 and yabeda-sidekiq 0.6.0.

In yabeda-sidekiq you can now use yabeda_tags method to add tags to Sidekiq-related metrics just like in sidekiq-prometheus (other metrics that were observed during job execution won't be affected). See Custom tags section in yabeda-sidekiq README.

I've added with_tags method into Yabeda itself that allows to add/redefine default tags to all metrics observed during block execution.

# This block is optional but some adapters (like Prometheus) requires that all tags should be declared in advance
Yabeda.configure do
  default_tag :importance, nil
end

Yabeda.with_tags(importance: 'high') do
  Yabeda.your_app.bells_rang_count.increment({bell_size: bell.size}, by: 1)
end

Also be aware that it is usually bad idea to place identifiers into Prometheus as it may cause huge memory consumption by both Prometheus and your application.

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

See Prometheus docs for reference: https://prometheus.io/docs/practices/naming/#labels

raivil commented 4 years ago

@Envek This is what I'm getting when using it:

Configs:

Yabeda.configure do
  default_tag :custom_label, nil
end

On a couple of jobs:

  def yabeda_tags(*_params) # This method will be called first, before +perform+
    { custom_label: "sample" }
  end

Output:

# TYPE sidekiq_jobs_enqueued_total counter
# HELP sidekiq_jobs_enqueued_total A counter of the total number of jobs sidekiq enqueued.
# TYPE sidekiq_jobs_executed_total counter
# HELP sidekiq_jobs_executed_total A counter of the total number of jobs sidekiq executed.
sidekiq_jobs_executed_total{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_jobs_success_total counter
# HELP sidekiq_jobs_success_total A counter of the total number of jobs successfully processed by sidekiq.
sidekiq_jobs_success_total{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_jobs_failed_total counter
# HELP sidekiq_jobs_failed_total A counter of the total number of jobs failed in sidekiq.
# TYPE sidekiq_jobs_waiting_count gauge
# HELP sidekiq_jobs_waiting_count The number of jobs waiting to process in sidekiq.
sidekiq_jobs_waiting_count{custom_label="",queue="low"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="default"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="critical"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_active_workers_count gauge
# HELP sidekiq_active_workers_count The number of currently running machines with sidekiq workers.
sidekiq_active_workers_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_scheduled_count gauge
# HELP sidekiq_jobs_scheduled_count The number of jobs scheduled for later execution.
sidekiq_jobs_scheduled_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_retry_count gauge
# HELP sidekiq_jobs_retry_count The number of failed jobs waiting to be retried
sidekiq_jobs_retry_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_dead_count gauge
# HELP sidekiq_jobs_dead_count The number of jobs exceeded their retry count.
sidekiq_jobs_dead_count{custom_label=""} 0.0
# TYPE sidekiq_active_processes gauge
# HELP sidekiq_active_processes The number of active Sidekiq worker processes.
sidekiq_active_processes{custom_label=""} 1.0
# TYPE sidekiq_queue_latency gauge
# HELP sidekiq_queue_latency The queue latency, the difference in seconds since the oldest job in the queue was enqueued
sidekiq_queue_latency{custom_label="",queue="critical"} 0.0
sidekiq_queue_latency{custom_label="",queue="default"} 0.0
sidekiq_queue_latency{custom_label="",queue="low"} 0.0
sidekiq_queue_latency{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_job_latency_seconds histogram
# HELP sidekiq_job_latency_seconds The job latency, the difference in seconds between enqueued and running time
sidekiq_job_latency_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.005"} 0.0
sidekiq_job_latency_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.01"} 0.0
sidekiq_job_latency_seconds_sum{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 24.11741614341736
sidekiq_job_latency_seconds_count{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_job_runtime_seconds histogram
# HELP sidekiq_job_runtime_seconds A histogram of the job execution time.
sidekiq_job_runtime_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.005"} 1.0
sidekiq_job_runtime_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.01"} 2.0
sidekiq_job_runtime_seconds_sum{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 0.05
sidekiq_job_runtime_seconds_count{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0

This is expected, right? The custom label will be added to all metrics with the default value.

Checking puma/rails metrics, I'm getting the error below. I've also tried with default_tag :custom_label, "empty" to check if the problem was the null value.


 NoMethodError (undefined method `set' for nil:NilClass):

 yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'
 yabeda (0.6.1) lib/yabeda/gauge.rb:10:in `block in set'
 yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `each'
 yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `set'
 yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:34:in `block (3 levels) in start'
 yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `each'
 yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `block (2 levels) in start'
 yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `each'
 yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `call'
 yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:18:in `call'
 actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
 actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:48:in `serve'
 actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:49:in `block in serve'
 actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `each'
 actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `serve'
 actionpack (6.0.3.2) lib/action_dispatch/routing/route_set.rb:834:in `call'
 bullet (6.1.0) lib/bullet/rack.rb:15:in `call'
Envek commented 4 years ago

Yeah. Output is expected.


NoMethodError (undefined method `set' for nil:NilClass):

 yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'

That means that there is no metric registered in Prometheus (yet?) so it can't find it here:

https://github.com/yabeda-rb/yabeda-prometheus/blob/abd9a258b21fd96e1bbe867b80fec5e81849d814/lib/yabeda/prometheus/adapter.rb#L53

It is most probably related to this change I made yesterday: https://github.com/yabeda-rb/yabeda/commit/d745cf7c6a8aaaf9a4d4a90d6429e448a302f155

Envek commented 4 years ago

Can't reproduce it now. Any reproduction steps or further details?

raivil commented 4 years ago

@Envek I'm using an existing application. I'll see if I can reproduce this on a brand new rails app and get back at you with results.

raivil commented 4 years ago

@Envek I've just tested with a brand new rails app (Rails 6.0.3.2) and it fails with the same error as before.

Created the app with rails new yabedatags and used all defaults.

Gems included:

gem "sidekiq", "~> 6.1.1"
gem "yabeda-prometheus", "~> 0.6.1"
gem "yabeda-puma-plugin", "~> 0.4.0"
gem "yabeda-rails", "~> 0.5.0"
gem "yabeda-sidekiq", "~> 0.6.0"

Sidekiq config:

Sidekiq.configure_server do |config|
  config.log_formatter = Sidekiq::Logger::Formatters::JSON.new
  Yabeda::Prometheus::Exporter.start_metrics_server!
end

Yabeda config:

Yabeda.configure do
  default_tag :custom_label, nil
end

Routes:

Rails.application.routes.draw do
  # For details on the DSL available within this file, see https://guides.rubyonrails.org/routing.html
  mount Yabeda::Prometheus::Exporter => "/metrics"
end

Puma config:

activate_control_app
plugin :yabeda

Failed request:

$ rails s
=> Booting Puma
=> Rails 6.0.3.2 application starting in development 
=> Run `rails server --help` for more startup options
Puma starting in single mode...
* Version 4.3.5 (ruby 2.6.5-p114), codename: Mysterious Traveller
* Min threads: 5, max threads: 5
* Environment: development
* Listening on tcp://127.0.0.1:3000
* Listening on tcp://[::1]:3000
* Starting control server on unix:///var/folders/w1/d2q8yvss40d4nmfhkgzhwg5w0000gn/T/puma-status-1595976279969-63554
Use Ctrl-C to stop
Started GET "/metrics" for ::1 at 2020-07-28 19:44:42 -0300
   (1.4ms)  SELECT sqlite_version(*)

NoMethodError (undefined method `set' for nil:NilClass):

yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'
yabeda (0.6.1) lib/yabeda/gauge.rb:10:in `block in set'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `each'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `set'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:34:in `block (3 levels) in start'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `each'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `block (2 levels) in start'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `each'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `call'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:18:in `call'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:48:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:49:in `block in serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `each'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/routing/route_set.rb:834:in `call'
rack (2.2.3) lib/rack/tempfile_reaper.rb:15:in `call'
rack (2.2.3) lib/rack/etag.rb:27:in `call'
rack (2.2.3) lib/rack/conditional_get.rb:27:in `call'
rack (2.2.3) lib/rack/head.rb:12:in `call'
actionpack (6.0.3.2) lib/action_dispatch/http/content_security_policy.rb:18:in `call'
rack (2.2.3) lib/rack/session/abstract/id.rb:266:in `context'
rack (2.2.3) lib/rack/session/abstract/id.rb:260:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/cookies.rb:648:in `call'
activerecord (6.0.3.2) lib/active_record/migration.rb:567:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'
activesupport (6.0.3.2) lib/active_support/callbacks.rb:101:in `run_callbacks'
actionpack (6.0.3.2) lib/action_dispatch/middleware/callbacks.rb:26:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/executor.rb:14:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
web-console (4.0.4) lib/web_console/middleware.rb:132:in `call_app'
web-console (4.0.4) lib/web_console/middleware.rb:28:in `block in call'
web-console (4.0.4) lib/web_console/middleware.rb:17:in `catch'
web-console (4.0.4) lib/web_console/middleware.rb:17:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
railties (6.0.3.2) lib/rails/rack/logger.rb:37:in `call_app'
railties (6.0.3.2) lib/rails/rack/logger.rb:26:in `block in call'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:80:in `block in tagged'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:28:in `tagged'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:80:in `tagged'
railties (6.0.3.2) lib/rails/rack/logger.rb:26:in `call'
sprockets-rails (3.2.1) lib/sprockets/rails/quiet_assets.rb:13:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/request_id.rb:27:in `call'
rack (2.2.3) lib/rack/method_override.rb:24:in `call'
rack (2.2.3) lib/rack/runtime.rb:22:in `call'
activesupport (6.0.3.2) lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/executor.rb:14:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/static.rb:126:in `call'
rack (2.2.3) lib/rack/sendfile.rb:110:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/host_authorization.rb:82:in `call'
webpacker (4.2.2) lib/webpacker/dev_server_proxy.rb:23:in `perform_request'
rack-proxy (0.6.5) lib/rack/proxy.rb:57:in `call'
railties (6.0.3.2) lib/rails/engine.rb:527:in `call'
puma (4.3.5) lib/puma/configuration.rb:228:in `call'
puma (4.3.5) lib/puma/server.rb:713:in `handle_request'
puma (4.3.5) lib/puma/server.rb:472:in `process_client'
puma (4.3.5) lib/puma/server.rb:328:in `block in run'
puma (4.3.5) lib/puma/thread_pool.rb:134:in `block in spawn_thread'

Sidekiq:

[2020-07-28 19:45:31] INFO  WEBrick 1.4.2
[2020-07-28 19:45:31] INFO  ruby 2.6.5 (2019-10-01) [x86_64-darwin19]
[2020-07-28 19:45:31] INFO  WEBrick::HTTPServer#start: pid=64138 port=9394
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Booted Rails 6.0.3.2 application in development environment"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Running in ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19]"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"See LICENSE and the LGPL-3.0 for licensing details."}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Upgrade to Sidekiq Pro for more features and support: https://sidekiq.org"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Booting Sidekiq 6.1.1 with redis options {}"}
{"ts":"2020-07-28T22:45:34.187Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Starting processing, hit Ctrl-C to stop"}
127.0.0.1 - - [28/Jul/2020:19:45:43 -0300] "GET /metrics HTTP/1.1" 200 - 0.0033

Sidekiq metrics:

# TYPE sidekiq_jobs_enqueued_total counter
# HELP sidekiq_jobs_enqueued_total A counter of the total number of jobs sidekiq enqueued.
# TYPE sidekiq_jobs_executed_total counter
# HELP sidekiq_jobs_executed_total A counter of the total number of jobs sidekiq executed.
# TYPE sidekiq_jobs_success_total counter
# HELP sidekiq_jobs_success_total A counter of the total number of jobs successfully processed by sidekiq.
# TYPE sidekiq_jobs_failed_total counter
# HELP sidekiq_jobs_failed_total A counter of the total number of jobs failed in sidekiq.
# TYPE sidekiq_jobs_waiting_count gauge
# HELP sidekiq_jobs_waiting_count The number of jobs waiting to process in sidekiq.
sidekiq_jobs_waiting_count{custom_label="",queue="low"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="default"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="medium"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="critical"} 0.0
# TYPE sidekiq_active_workers_count gauge
# HELP sidekiq_active_workers_count The number of currently running machines with sidekiq workers.
sidekiq_active_workers_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_scheduled_count gauge
# HELP sidekiq_jobs_scheduled_count The number of jobs scheduled for later execution.
sidekiq_jobs_scheduled_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_retry_count gauge
# HELP sidekiq_jobs_retry_count The number of failed jobs waiting to be retried
sidekiq_jobs_retry_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_dead_count gauge
# HELP sidekiq_jobs_dead_count The number of jobs exceeded their retry count.
sidekiq_jobs_dead_count{custom_label=""} 0.0
# TYPE sidekiq_active_processes gauge
# HELP sidekiq_active_processes The number of active Sidekiq worker processes.
sidekiq_active_processes{custom_label=""} 1.0
# TYPE sidekiq_queue_latency gauge
# HELP sidekiq_queue_latency The queue latency, the difference in seconds since the oldest job in the queue was enqueued
sidekiq_queue_latency{custom_label="",queue="critical"} 0.0
sidekiq_queue_latency{custom_label="",queue="default"} 0.0
sidekiq_queue_latency{custom_label="",queue="low"} 0.0
sidekiq_queue_latency{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_job_latency_seconds histogram
# HELP sidekiq_job_latency_seconds The job latency, the difference in seconds between enqueued and running time
# TYPE sidekiq_job_runtime_seconds histogram
# HELP sidekiq_job_runtime_seconds A histogram of the job execution time.
raivil commented 4 years ago

@Envek the issue is the presence of yabeda-puma-plugin :)

Envek commented 4 years ago

Thank you for reproduction steps! I will try to get to it this week but can't promise.

Envek commented 4 years ago

Fixed in 0.6.2. Apparently I broke it in https://github.com/yabeda-rb/yabeda/commit/d745cf7c6a8aaaf9a4d4a90d6429e448a302f155

raivil commented 4 years ago

@Envek it's working 👍

# TYPE puma_backlog gauge
# HELP puma_backlog Number of established but unaccepted connections in the backlog
puma_backlog{custom_label="",index="0"} 0.0
# TYPE puma_running gauge
# HELP puma_running Number of running worker threads
puma_running{custom_label="",index="0"} 5.0
# TYPE puma_pool_capacity gauge
# HELP puma_pool_capacity Number of allocatable worker threads
puma_pool_capacity{custom_label="",index="0"} 4.0
# TYPE puma_max_threads gauge
# HELP puma_max_threads Maximum number of worker threads
puma_max_threads{custom_label="",index="0"} 5.0
Envek commented 4 years ago

Great! If you have any other questions/concerns/ideas, feel free to open new issues