Closed raivil closed 4 years ago
Hi! Can you please tell more about your use case? What do you want to do? How it should look?
Hey @Envek,
My idea with adding custom tags would be to support metrics segmented by Customers or any other attribute. Example:
sidekiq_job_latency_seconds_sum{queue="default",worker="Jobs::SomeJobClass", customer_id="ABC"} 33.04740643501282
sidekiq_job_latency_seconds_count{queue="default",worker="Jobs::SomeJobClass", customer_id="ABC"} 65.0
For more context, on Newrelic or ELK, it's possible to add custom labels or context data to transactions, that would work similar to adding tags/labels to Prometheus metrics. Examples: ELK and Newrelic
I'm not sure if this is the best way to accomplish that, and I'd appreciate your input.
@raivil What do you think if it will look like this?
Yabeda.with_tags(additional: labels) do
# your code here
end
It have to be in block form as in general it is hard to say when current transaction is over and custom tags should be cleared.
But you can use it in Rails controller's around_action
or wrap your jobs with custom sidekiq server middleware.
@Envek that could work.
For yabeda-rb/yabeda-sidekiq
, maybe something, as shown above, is also a possible solution (but not so flexible to get dynamic values):
This code is from the Fastly gem examples (https://github.com/fastly/sidekiq-prometheus)
class SomeWorker
include Sidekiq::Worker
def prometheus_labels
{ some: 'label' }
end
end
A method like that could "fit" with the code from the existing middleware (https://github.com/yabeda-rb/yabeda-sidekiq/blob/master/lib/yabeda/sidekiq/client_middleware.rb)
Yabeda::Sidekiq.labelize(worker, job, queue, prometheus_labels) # added an optional param here
Something similar could be done on https://github.com/yabeda-rb/yabeda-rails/blob/master/lib/yabeda/rails.rb.
Not sure if those are the best solutions or places to do the changes, I just wanted to highlight the idea.
I've released yabeda 0.6.0 and yabeda-sidekiq 0.6.0.
In yabeda-sidekiq you can now use yabeda_tags
method to add tags to Sidekiq-related metrics just like in sidekiq-prometheus (other metrics that were observed during job execution won't be affected). See Custom tags section in yabeda-sidekiq README.
I've added with_tags
method into Yabeda itself that allows to add/redefine default tags to all metrics observed during block execution.
# This block is optional but some adapters (like Prometheus) requires that all tags should be declared in advance
Yabeda.configure do
default_tag :importance, nil
end
Yabeda.with_tags(importance: 'high') do
Yabeda.your_app.bells_rang_count.increment({bell_size: bell.size}, by: 1)
end
Also be aware that it is usually bad idea to place identifiers into Prometheus as it may cause huge memory consumption by both Prometheus and your application.
CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
See Prometheus docs for reference: https://prometheus.io/docs/practices/naming/#labels
@Envek This is what I'm getting when using it:
Configs:
Yabeda.configure do
default_tag :custom_label, nil
end
On a couple of jobs:
def yabeda_tags(*_params) # This method will be called first, before +perform+
{ custom_label: "sample" }
end
Output:
# TYPE sidekiq_jobs_enqueued_total counter
# HELP sidekiq_jobs_enqueued_total A counter of the total number of jobs sidekiq enqueued.
# TYPE sidekiq_jobs_executed_total counter
# HELP sidekiq_jobs_executed_total A counter of the total number of jobs sidekiq executed.
sidekiq_jobs_executed_total{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_jobs_success_total counter
# HELP sidekiq_jobs_success_total A counter of the total number of jobs successfully processed by sidekiq.
sidekiq_jobs_success_total{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_jobs_failed_total counter
# HELP sidekiq_jobs_failed_total A counter of the total number of jobs failed in sidekiq.
# TYPE sidekiq_jobs_waiting_count gauge
# HELP sidekiq_jobs_waiting_count The number of jobs waiting to process in sidekiq.
sidekiq_jobs_waiting_count{custom_label="",queue="low"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="default"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="critical"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_active_workers_count gauge
# HELP sidekiq_active_workers_count The number of currently running machines with sidekiq workers.
sidekiq_active_workers_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_scheduled_count gauge
# HELP sidekiq_jobs_scheduled_count The number of jobs scheduled for later execution.
sidekiq_jobs_scheduled_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_retry_count gauge
# HELP sidekiq_jobs_retry_count The number of failed jobs waiting to be retried
sidekiq_jobs_retry_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_dead_count gauge
# HELP sidekiq_jobs_dead_count The number of jobs exceeded their retry count.
sidekiq_jobs_dead_count{custom_label=""} 0.0
# TYPE sidekiq_active_processes gauge
# HELP sidekiq_active_processes The number of active Sidekiq worker processes.
sidekiq_active_processes{custom_label=""} 1.0
# TYPE sidekiq_queue_latency gauge
# HELP sidekiq_queue_latency The queue latency, the difference in seconds since the oldest job in the queue was enqueued
sidekiq_queue_latency{custom_label="",queue="critical"} 0.0
sidekiq_queue_latency{custom_label="",queue="default"} 0.0
sidekiq_queue_latency{custom_label="",queue="low"} 0.0
sidekiq_queue_latency{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_job_latency_seconds histogram
# HELP sidekiq_job_latency_seconds The job latency, the difference in seconds between enqueued and running time
sidekiq_job_latency_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.005"} 0.0
sidekiq_job_latency_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.01"} 0.0
sidekiq_job_latency_seconds_sum{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 24.11741614341736
sidekiq_job_latency_seconds_count{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
# TYPE sidekiq_job_runtime_seconds histogram
# HELP sidekiq_job_runtime_seconds A histogram of the job execution time.
sidekiq_job_runtime_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.005"} 1.0
sidekiq_job_runtime_seconds_bucket{custom_label="sample",queue="default",worker="Jobs::SampleJob",le="0.01"} 2.0
sidekiq_job_runtime_seconds_sum{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 0.05
sidekiq_job_runtime_seconds_count{custom_label="sample",queue="default",worker="Jobs::SampleJob"} 3.0
This is expected, right? The custom label will be added to all metrics with the default value.
Checking puma/rails metrics, I'm getting the error below.
I've also tried with default_tag :custom_label, "empty"
to check if the problem was the null value.
NoMethodError (undefined method `set' for nil:NilClass):
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'
yabeda (0.6.1) lib/yabeda/gauge.rb:10:in `block in set'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `each'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `set'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:34:in `block (3 levels) in start'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `each'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `block (2 levels) in start'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `each'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `call'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:18:in `call'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:48:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:49:in `block in serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `each'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/routing/route_set.rb:834:in `call'
bullet (6.1.0) lib/bullet/rack.rb:15:in `call'
Yeah. Output is expected.
NoMethodError (undefined method `set' for nil:NilClass):
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'
That means that there is no metric registered in Prometheus (yet?) so it can't find it here:
It is most probably related to this change I made yesterday: https://github.com/yabeda-rb/yabeda/commit/d745cf7c6a8aaaf9a4d4a90d6429e448a302f155
Can't reproduce it now. Any reproduction steps or further details?
@Envek I'm using an existing application. I'll see if I can reproduce this on a brand new rails app and get back at you with results.
@Envek I've just tested with a brand new rails app (Rails 6.0.3.2) and it fails with the same error as before.
Created the app with rails new yabedatags
and used all defaults.
Gems included:
gem "sidekiq", "~> 6.1.1"
gem "yabeda-prometheus", "~> 0.6.1"
gem "yabeda-puma-plugin", "~> 0.4.0"
gem "yabeda-rails", "~> 0.5.0"
gem "yabeda-sidekiq", "~> 0.6.0"
Sidekiq config:
Sidekiq.configure_server do |config|
config.log_formatter = Sidekiq::Logger::Formatters::JSON.new
Yabeda::Prometheus::Exporter.start_metrics_server!
end
Yabeda config:
Yabeda.configure do
default_tag :custom_label, nil
end
Routes:
Rails.application.routes.draw do
# For details on the DSL available within this file, see https://guides.rubyonrails.org/routing.html
mount Yabeda::Prometheus::Exporter => "/metrics"
end
Puma config:
activate_control_app
plugin :yabeda
Failed request:
$ rails s
=> Booting Puma
=> Rails 6.0.3.2 application starting in development
=> Run `rails server --help` for more startup options
Puma starting in single mode...
* Version 4.3.5 (ruby 2.6.5-p114), codename: Mysterious Traveller
* Min threads: 5, max threads: 5
* Environment: development
* Listening on tcp://127.0.0.1:3000
* Listening on tcp://[::1]:3000
* Starting control server on unix:///var/folders/w1/d2q8yvss40d4nmfhkgzhwg5w0000gn/T/puma-status-1595976279969-63554
Use Ctrl-C to stop
Started GET "/metrics" for ::1 at 2020-07-28 19:44:42 -0300
(1.4ms) SELECT sqlite_version(*)
NoMethodError (undefined method `set' for nil:NilClass):
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/adapter.rb:53:in `perform_gauge_set!'
yabeda (0.6.1) lib/yabeda/gauge.rb:10:in `block in set'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `each'
yabeda (0.6.1) lib/yabeda/gauge.rb:9:in `set'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:34:in `block (3 levels) in start'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `each'
yabeda-puma-plugin (0.4.0) lib/puma/plugin/yabeda.rb:33:in `block (2 levels) in start'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `each'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:48:in `call'
yabeda-prometheus (0.6.1) lib/yabeda/prometheus/exporter.rb:18:in `call'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:19:in `block in <class:Constraints>'
actionpack (6.0.3.2) lib/action_dispatch/routing/mapper.rb:48:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:49:in `block in serve'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `each'
actionpack (6.0.3.2) lib/action_dispatch/journey/router.rb:32:in `serve'
actionpack (6.0.3.2) lib/action_dispatch/routing/route_set.rb:834:in `call'
rack (2.2.3) lib/rack/tempfile_reaper.rb:15:in `call'
rack (2.2.3) lib/rack/etag.rb:27:in `call'
rack (2.2.3) lib/rack/conditional_get.rb:27:in `call'
rack (2.2.3) lib/rack/head.rb:12:in `call'
actionpack (6.0.3.2) lib/action_dispatch/http/content_security_policy.rb:18:in `call'
rack (2.2.3) lib/rack/session/abstract/id.rb:266:in `context'
rack (2.2.3) lib/rack/session/abstract/id.rb:260:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/cookies.rb:648:in `call'
activerecord (6.0.3.2) lib/active_record/migration.rb:567:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/callbacks.rb:27:in `block in call'
activesupport (6.0.3.2) lib/active_support/callbacks.rb:101:in `run_callbacks'
actionpack (6.0.3.2) lib/action_dispatch/middleware/callbacks.rb:26:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/executor.rb:14:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
web-console (4.0.4) lib/web_console/middleware.rb:132:in `call_app'
web-console (4.0.4) lib/web_console/middleware.rb:28:in `block in call'
web-console (4.0.4) lib/web_console/middleware.rb:17:in `catch'
web-console (4.0.4) lib/web_console/middleware.rb:17:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
railties (6.0.3.2) lib/rails/rack/logger.rb:37:in `call_app'
railties (6.0.3.2) lib/rails/rack/logger.rb:26:in `block in call'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:80:in `block in tagged'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:28:in `tagged'
activesupport (6.0.3.2) lib/active_support/tagged_logging.rb:80:in `tagged'
railties (6.0.3.2) lib/rails/rack/logger.rb:26:in `call'
sprockets-rails (3.2.1) lib/sprockets/rails/quiet_assets.rb:13:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/request_id.rb:27:in `call'
rack (2.2.3) lib/rack/method_override.rb:24:in `call'
rack (2.2.3) lib/rack/runtime.rb:22:in `call'
activesupport (6.0.3.2) lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/executor.rb:14:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/static.rb:126:in `call'
rack (2.2.3) lib/rack/sendfile.rb:110:in `call'
actionpack (6.0.3.2) lib/action_dispatch/middleware/host_authorization.rb:82:in `call'
webpacker (4.2.2) lib/webpacker/dev_server_proxy.rb:23:in `perform_request'
rack-proxy (0.6.5) lib/rack/proxy.rb:57:in `call'
railties (6.0.3.2) lib/rails/engine.rb:527:in `call'
puma (4.3.5) lib/puma/configuration.rb:228:in `call'
puma (4.3.5) lib/puma/server.rb:713:in `handle_request'
puma (4.3.5) lib/puma/server.rb:472:in `process_client'
puma (4.3.5) lib/puma/server.rb:328:in `block in run'
puma (4.3.5) lib/puma/thread_pool.rb:134:in `block in spawn_thread'
Sidekiq:
[2020-07-28 19:45:31] INFO WEBrick 1.4.2
[2020-07-28 19:45:31] INFO ruby 2.6.5 (2019-10-01) [x86_64-darwin19]
[2020-07-28 19:45:31] INFO WEBrick::HTTPServer#start: pid=64138 port=9394
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Booted Rails 6.0.3.2 application in development environment"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Running in ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin19]"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"See LICENSE and the LGPL-3.0 for licensing details."}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Upgrade to Sidekiq Pro for more features and support: https://sidekiq.org"}
{"ts":"2020-07-28T22:45:34.185Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Booting Sidekiq 6.1.1 with redis options {}"}
{"ts":"2020-07-28T22:45:34.187Z","pid":64138,"tid":"oxgb6ulkm","lvl":"INFO","msg":"Starting processing, hit Ctrl-C to stop"}
127.0.0.1 - - [28/Jul/2020:19:45:43 -0300] "GET /metrics HTTP/1.1" 200 - 0.0033
Sidekiq metrics:
# TYPE sidekiq_jobs_enqueued_total counter
# HELP sidekiq_jobs_enqueued_total A counter of the total number of jobs sidekiq enqueued.
# TYPE sidekiq_jobs_executed_total counter
# HELP sidekiq_jobs_executed_total A counter of the total number of jobs sidekiq executed.
# TYPE sidekiq_jobs_success_total counter
# HELP sidekiq_jobs_success_total A counter of the total number of jobs successfully processed by sidekiq.
# TYPE sidekiq_jobs_failed_total counter
# HELP sidekiq_jobs_failed_total A counter of the total number of jobs failed in sidekiq.
# TYPE sidekiq_jobs_waiting_count gauge
# HELP sidekiq_jobs_waiting_count The number of jobs waiting to process in sidekiq.
sidekiq_jobs_waiting_count{custom_label="",queue="low"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="default"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="medium"} 0.0
sidekiq_jobs_waiting_count{custom_label="",queue="critical"} 0.0
# TYPE sidekiq_active_workers_count gauge
# HELP sidekiq_active_workers_count The number of currently running machines with sidekiq workers.
sidekiq_active_workers_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_scheduled_count gauge
# HELP sidekiq_jobs_scheduled_count The number of jobs scheduled for later execution.
sidekiq_jobs_scheduled_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_retry_count gauge
# HELP sidekiq_jobs_retry_count The number of failed jobs waiting to be retried
sidekiq_jobs_retry_count{custom_label=""} 0.0
# TYPE sidekiq_jobs_dead_count gauge
# HELP sidekiq_jobs_dead_count The number of jobs exceeded their retry count.
sidekiq_jobs_dead_count{custom_label=""} 0.0
# TYPE sidekiq_active_processes gauge
# HELP sidekiq_active_processes The number of active Sidekiq worker processes.
sidekiq_active_processes{custom_label=""} 1.0
# TYPE sidekiq_queue_latency gauge
# HELP sidekiq_queue_latency The queue latency, the difference in seconds since the oldest job in the queue was enqueued
sidekiq_queue_latency{custom_label="",queue="critical"} 0.0
sidekiq_queue_latency{custom_label="",queue="default"} 0.0
sidekiq_queue_latency{custom_label="",queue="low"} 0.0
sidekiq_queue_latency{custom_label="",queue="medium"} 0.0
# TYPE sidekiq_job_latency_seconds histogram
# HELP sidekiq_job_latency_seconds The job latency, the difference in seconds between enqueued and running time
# TYPE sidekiq_job_runtime_seconds histogram
# HELP sidekiq_job_runtime_seconds A histogram of the job execution time.
@Envek the issue is the presence of yabeda-puma-plugin
:)
Thank you for reproduction steps! I will try to get to it this week but can't promise.
Fixed in 0.6.2. Apparently I broke it in https://github.com/yabeda-rb/yabeda/commit/d745cf7c6a8aaaf9a4d4a90d6429e448a302f155
@Envek it's working 👍
# TYPE puma_backlog gauge
# HELP puma_backlog Number of established but unaccepted connections in the backlog
puma_backlog{custom_label="",index="0"} 0.0
# TYPE puma_running gauge
# HELP puma_running Number of running worker threads
puma_running{custom_label="",index="0"} 5.0
# TYPE puma_pool_capacity gauge
# HELP puma_pool_capacity Number of allocatable worker threads
puma_pool_capacity{custom_label="",index="0"} 4.0
# TYPE puma_max_threads gauge
# HELP puma_max_threads Maximum number of worker threads
puma_max_threads{custom_label="",index="0"} 5.0
Great! If you have any other questions/concerns/ideas, feel free to open new issues
Hi,
Them gem doesn't seem to support custom labels/tags that are not fixed value
default_tag
. Are there plans to add this kind of functionality?Thank you.