yabeda-rb / yabeda-datadog

Yabeda Datadog adapter. Collect and send custom metrics from Ruby apps to Datadog.
MIT License
14 stars 3 forks source link

Better adapter implementation #3

Closed Envek closed 5 years ago

Envek commented 5 years ago

Hi! Thank you for implementing DataDog adapter for Yabeda!

Why do you decide to implement the dogstatsd-based solution?

I wonder whether overhead for setting up the dogstatsd daemon worth it? Users will need to deploy it into their infrastructure (kinda complicated, especially for container-based deployments).

I can see that you're already using a mix of dogapi calls and statsd reporting. Moreover, you manually creating histograms from sub metrics (sum, avg, percentiles) which shouldn't be needed when using dogstatsd-ruby gem because it seems to already have histogram method for just recording a new value for the histogram.

With best regards, Andrey.

dmshvetsov commented 5 years ago

Hi 👋 Andrey!

Why do you decide to implement the dogstatsd-based solution?

I made this decision because it is more efficient than using Datadog metrics API via dogapi or directly.

I wonder whether overhead for setting up the dogstatsd daemon worth it? Users will need to deploy it into their infrastructure (kinda complicated, especially for container-based deployments).

I agree that it complicates setup and/or deployment. My thought was if you are using Datadog you may already have the agent up and running.

I can see that you're already using a mix of dogapi calls and statsd reporting.

dogapi is used for updating metadata of metrics in register methods. It is not required to register metrics but I use dogapi to update metadata. I have to admit that I think updating metadata of metrics withyabeda-datadog has some limitations and may not worth to implement it this way. Maybe it should be removed from the gem.

Moreover, you manually creating histograms from sub-metrics (sum, avg, percentiles) which shouldn't be needed when using dogstatsd-ruby gem because it seems to already have histogram method for just recording a new value for the histogram.

I am not creating just updating metadata as I mention above. Again still in thoughts should it be left in the gem.

I will be glad to hear your thoughts on this. Thanks 🙌

dmshvetsov commented 5 years ago

@Envek by the way for container-based deployments Datadog has the dockerized agent and here is an example of Rails + Sidekiq yabeda-datadog setup with Datadog Agent in a container.

It is based on your yabeda-prometheus example 😉

Envek commented 5 years ago

Hmmm, I think that it isn't required to specify metadata. It is mostly for cases when you've declared it wrong before and now need to fix it. But it needs to be checked first.

dmshvetsov commented 5 years ago

I think that it isn't required to specify metadata

Absolutely. The bang register methods just to pass unit, per and comment metadata from yabeda config to Datadog.

Envek commented 5 years ago

to pass unit, per and comment metadata

This sounds useful and viable. Let's keep it.

Another thought: while sending UDP datagrams is much cheaper than sending HTTP requests to the API it is still happening in sync and every call to dogstatsd-ruby method sends a separate datagram. Maybe it also useful to save them and send from another thread using dogstatsd-ruby's built-in batching.

Also Thread.new for sending messages async looks not so reliable because at some point of time a lot of threads can be spawned and a lot of CPU time will be spent in context switching. IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)

dmshvetsov commented 5 years ago

Maybe it also useful to save them and send from another thread using dogstatsd-ruby's built-in batching.

IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)

I agree on both points. I will rewrite the adapter using your proposals.

palkan commented 5 years ago

IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)

I think, Queue should work fine. Take a look at influxdb-ruby realization, for example: https://github.com/influxdata/influxdb-ruby/blob/master/lib/influxdb/writer/async.rb.

My thought was if you are using Datadog you may already have the agent up and running.

Agree. That's kinda default Datadog setup, so we can use it.

dmshvetsov commented 5 years ago

@Envek @palkan please take a look at Adapter and async Worker. I am interested in your opinion.

Envek commented 5 years ago

I've taken a brief look at it and I have a reading for you: https://habr.com/company/infopulse/blog/427843/

My point is that you do not need sleep at all. You need to drain the queue, send a batch and wait until new data will arrive. The mix of blocking and non-blocking calls to Queue#pop will do this job.

See the example:

require 'thread'

queue = Queue.new

workers = Array.new(2) do |idx|
  Thread.new do
    batched = []
    while !(queue.closed? && queue.empty?)
      item = queue.pop(false) # Here we will sleep if queue is empty and will exit if it become closed
      batched << item if item # nils are possible in case of empty queue
      while batched.size < 150
        item = queue.pop(true) rescue break
        batched << item if item
      end
      puts "Thread #{idx}: #{batched.inspect}" # puts is also IO much like as network 
      batched.clear
    end
  end
end

puts "Enqueue'em all"
1000.times do |i|
  queue.push(i)
  if i.positive? && (i % 100).zero?
    Thread.pass # Ask for context switch to mimic some long running code
  end
end
queue.close # should be called in `stop`
puts "Enqueued'em all"

# Wait while workers will finish their job (drain the queues) - this could take forever so `exit` might be a better choice
# You can take a look at how Sidekiq handles long running workers on exit, see https://github.com/mperham/sidekiq/wiki/Signals#term
workers.map(&:join)

Also, be careful about ordering of sent data is not guaranteed when you use multiple threads (is it okay in our case?):

Thread 1: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200]
Thread 0: [301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400]
Thread 1: [201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300]

Then about usage of SizedQueue. It prevents unlimited memory usage (that is good) but the problem is that it will block main application (which enqueues metrics to be sent) in case if queue became full. Deciding whether it is an expected behaviour or not is up to you.

Envek commented 5 years ago

There is also possibility to batch calls to DataDog API: https://github.com/DataDog/dogapi-rb/blob/022c3693e6b0366afea93669b73e2b576e8ed2a5/lib/dogapi/facade.rb#L109-L117

So Yabeda::Datadog::Worker::REGISTER could also use batching.

Envek commented 5 years ago

By the way why do you using procs assigned to constants? Why not to create “callable classes” (with call method). This will be more clean and will allow in future to extract some chunks to other methods in that classes.

dmshvetsov commented 5 years ago

@Envek

By the way why do you using procs assigned to constants? Why not to create “callable classes” (with call method).

I'm doing this because Procs take less memory. 80 bytes needed for empty proc, and 152 bytes needed for empty class. But this was totally experimental because the saving 72 bytes on each two actions Procs don't have an impact when we are allocating memory for yabeda_datadog gem with its dependencies, approximately 779326 bytes.

There is also possibility to batch calls to DataDog API: https://github.com/DataDog/dogapi-rb/blob/022c3693e6b0366afea93669b73e2b576e8ed2a5/lib/dogapi/facade.rb#L109-L117

I saw this batch method. I belief this is not for sending metadata, because #batch_metrics working with Dogapi::V1::MetricService and for updating metric metadata I am using Dogapi::V1:: MetadataService

Thanks for the article and code review, I will get back with changes in a day or two.

dmshvetsov commented 5 years ago

0.3.0.rc1 available with latest changes related to this issue

https://rubygems.org/gems/yabeda-datadog/versions/0.3.0.rc1

dmshvetsov commented 5 years ago

Better to use 0.3.0.rc2. It has configuration bug fix and improved logging.

dmshvetsov commented 5 years ago

https://github.com/yabeda-rb/yabeda-datadog/commit/2cd5d2b8fb5083695a6dbcc3135eb9caa4623042 solves the issue

Envek commented 10 months ago

Sorry for reviving old issue, but can anyone recall why a few threads were used for metrics publishing in the first place? Isn't single thread would be enough for async metrics sending with batching?

dmshvetsov commented 10 months ago

do not remember exactly why, my guess it was done for flexibility or by porting existing solution with multiple threads.

I think there should be a way to run it in a single thread.

dmshvetsov commented 10 months ago

default num_threads can be changed here https://github.com/yabeda-rb/yabeda-datadog/blob/master/lib/yabeda/datadog/config.rb#L5-L21

or be overridden by the library user in the Yaml configuration yabeda_datadog.yml

num_threads: 1
Envek commented 10 months ago

I'm just thinking about simplifying current implementation and removing multi-thread support from metric pushing at all.

dmshvetsov commented 10 months ago

sure, why not

I'm just thinking about simplifying current implementation and removing multi-thread support from metric pushing at all.