Closed Envek closed 5 years ago
Hi 👋 Andrey!
Why do you decide to implement the dogstatsd-based solution?
I made this decision because it is more efficient than using Datadog metrics API via dogapi
or directly.
I wonder whether overhead for setting up the dogstatsd daemon worth it? Users will need to deploy it into their infrastructure (kinda complicated, especially for container-based deployments).
I agree that it complicates setup and/or deployment. My thought was if you are using Datadog you may already have the agent up and running.
I can see that you're already using a mix of dogapi calls and statsd reporting.
dogapi
is used for updating metadata of metrics in register methods. It is not required to register metrics but I use dogapi
to update metadata. I have to admit that I think updating metadata of metrics withyabeda-datadog
has some limitations and may not worth to implement it this way. Maybe it should be removed from the gem.
Moreover, you manually creating histograms from sub-metrics (sum, avg, percentiles) which shouldn't be needed when using
dogstatsd-ruby
gem because it seems to already have histogram method for just recording a new value for the histogram.
I am not creating just updating metadata as I mention above. Again still in thoughts should it be left in the gem.
I will be glad to hear your thoughts on this. Thanks 🙌
@Envek by the way for container-based deployments Datadog has the dockerized agent and here is an example of Rails + Sidekiq yabeda-datadog
setup with Datadog Agent in a container.
It is based on your yabeda-prometheus
example 😉
Hmmm, I think that it isn't required to specify metadata. It is mostly for cases when you've declared it wrong before and now need to fix it. But it needs to be checked first.
I think that it isn't required to specify metadata
Absolutely. The bang register methods just to pass unit, per and comment metadata from yabeda config to Datadog.
to pass unit, per and comment metadata
This sounds useful and viable. Let's keep it.
Another thought: while sending UDP datagrams is much cheaper than sending HTTP requests to the API it is still happening in sync and every call to dogstatsd-ruby method sends a separate datagram. Maybe it also useful to save them and send from another thread using dogstatsd-ruby's built-in batching.
Also Thread.new
for sending messages async looks not so reliable because at some point of time a lot of threads can be spawned and a lot of CPU time will be spent in context switching. IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)
Maybe it also useful to save them and send from another thread using dogstatsd-ruby's built-in batching.
IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)
I agree on both points. I will rewrite the adapter using your proposals.
IMO it is better to have one or two long-living threads which will receive messages to send via stdlib queue or concurrent-ruby-edge Channel (it's a pity that it still not stable so I can't recommend it but it looks very promising as it can be seen from example)
I think, Queue should work fine. Take a look at influxdb-ruby
realization, for example: https://github.com/influxdata/influxdb-ruby/blob/master/lib/influxdb/writer/async.rb.
My thought was if you are using Datadog you may already have the agent up and running.
Agree. That's kinda default Datadog setup, so we can use it.
@Envek @palkan please take a look at Adapter and async Worker. I am interested in your opinion.
I've taken a brief look at it and I have a reading for you: https://habr.com/company/infopulse/blog/427843/
My point is that you do not need sleep
at all. You need to drain the queue, send a batch and wait until new data will arrive. The mix of blocking and non-blocking calls to Queue#pop
will do this job.
See the example:
require 'thread'
queue = Queue.new
workers = Array.new(2) do |idx|
Thread.new do
batched = []
while !(queue.closed? && queue.empty?)
item = queue.pop(false) # Here we will sleep if queue is empty and will exit if it become closed
batched << item if item # nils are possible in case of empty queue
while batched.size < 150
item = queue.pop(true) rescue break
batched << item if item
end
puts "Thread #{idx}: #{batched.inspect}" # puts is also IO much like as network
batched.clear
end
end
end
puts "Enqueue'em all"
1000.times do |i|
queue.push(i)
if i.positive? && (i % 100).zero?
Thread.pass # Ask for context switch to mimic some long running code
end
end
queue.close # should be called in `stop`
puts "Enqueued'em all"
# Wait while workers will finish their job (drain the queues) - this could take forever so `exit` might be a better choice
# You can take a look at how Sidekiq handles long running workers on exit, see https://github.com/mperham/sidekiq/wiki/Signals#term
workers.map(&:join)
Also, be careful about ordering of sent data is not guaranteed when you use multiple threads (is it okay in our case?):
Thread 1: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200]
Thread 0: [301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400]
Thread 1: [201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300]
Then about usage of SizedQueue
. It prevents unlimited memory usage (that is good) but the problem is that it will block main application (which enqueues metrics to be sent) in case if queue became full. Deciding whether it is an expected behaviour or not is up to you.
There is also possibility to batch calls to DataDog API: https://github.com/DataDog/dogapi-rb/blob/022c3693e6b0366afea93669b73e2b576e8ed2a5/lib/dogapi/facade.rb#L109-L117
So Yabeda::Datadog::Worker::REGISTER
could also use batching.
By the way why do you using procs assigned to constants? Why not to create “callable classes” (with call
method). This will be more clean and will allow in future to extract some chunks to other methods in that classes.
@Envek
By the way why do you using procs assigned to constants? Why not to create “callable classes” (with call method).
I'm doing this because Procs take less memory. 80 bytes needed for empty proc, and 152 bytes needed for empty class. But this was totally experimental because the saving 72 bytes on each two actions Procs don't have an impact when we are allocating memory for yabeda_datadog
gem with its dependencies, approximately 779326 bytes.
There is also possibility to batch calls to DataDog API: https://github.com/DataDog/dogapi-rb/blob/022c3693e6b0366afea93669b73e2b576e8ed2a5/lib/dogapi/facade.rb#L109-L117
I saw this batch method. I belief this is not for sending metadata, because #batch_metrics
working with Dogapi::V1::MetricService and for updating metric metadata I am using Dogapi::V1:: MetadataService
Thanks for the article and code review, I will get back with changes in a day or two.
0.3.0.rc1 available with latest changes related to this issue
Better to use 0.3.0.rc2. It has configuration bug fix and improved logging.
Sorry for reviving old issue, but can anyone recall why a few threads were used for metrics publishing in the first place? Isn't single thread would be enough for async metrics sending with batching?
do not remember exactly why, my guess it was done for flexibility or by porting existing solution with multiple threads.
I think there should be a way to run it in a single thread.
default num_threads can be changed here https://github.com/yabeda-rb/yabeda-datadog/blob/master/lib/yabeda/datadog/config.rb#L5-L21
or be overridden by the library user in the Yaml configuration yabeda_datadog.yml
num_threads: 1
I'm just thinking about simplifying current implementation and removing multi-thread support from metric pushing at all.
sure, why not
I'm just thinking about simplifying current implementation and removing multi-thread support from metric pushing at all.
Hi! Thank you for implementing DataDog adapter for Yabeda!
Why do you decide to implement the dogstatsd-based solution?
I wonder whether overhead for setting up the dogstatsd daemon worth it? Users will need to deploy it into their infrastructure (kinda complicated, especially for container-based deployments).
I can see that you're already using a mix of dogapi calls and statsd reporting. Moreover, you manually creating histograms from sub metrics (sum, avg, percentiles) which shouldn't be needed when using dogstatsd-ruby gem because it seems to already have
histogram
method for just recording a new value for the histogram.With best regards, Andrey.