Closed JHK closed 1 year ago
Thanks for asking. And also for using this gem in racecar! :-)
I have considered integrating AS instrumentation, but given the nature of the underlying C lib I don't see a way in which that approach works well. Did you see the statistics callback we added? https://github.com/appsignal/rdkafka-ruby/pull/40
I just noticed the docs on rubydocs are not properly regenerated for some reason, so you might have missed that.
Also some callbacks would definitively make sense to add, especially for partition assignment changes.
It would be really good if the instrumentation engine was not AS Notif based but rather AS Notif compatbile so other engines can be plugged in (like dry-monitor that we use in Karafka)
@mensfeld I updated the ticket description to be more clear to not rely on ActiveSupport, but rather use the same interface for instrumentation.
The statistics endpoint goes into the right direction, but is not what I meant with this issue. It is about being able to connect the instrumentation e.g. to the datadog agent to be able to introspect what happened on each and every request (that got recorded). There it is quite handy to know which branch the code took, how often and what time it took.
I've been thinking about this quite a bit, especially since I work on a monitoring product all day.
The thing is that I'm not sure there actually is something to measure. Librdkafka does a lot of buffering in the background. Actually consuming a message from Ruby pops something of an internal buffer, which is always super fast. I think what you're talking about mainly happens inside librdkafka. The stats for that are present in the statistics callback.
Can you give an example of where you'd like to see hooks? What would these hooks really allow you to measure?
Looking at the instrumentation of ruby-kafka
it provides a notification one can subscribe to whenever a message produce
gets called. It provides some meta information (code).
This can then be used for example in the datadog-agent or (like in my case) to time_bandits to determine the call frequency per request or similar metrics.
Right, I think I understand the use case better. You're not so much interested in the performance of the produce call. But you do want to get hooks and see the volume?
@thijsc I am interested in the produce performance. Having the instrumentation for it would allow also for the volume at least for DD using the increment over the messages sent to a particular topic.
I am interested in the produce performance.
What do you see yourself measuring exactly?
What do you see yourself measuring exactly?
How many messages can I send per second depending on the ack level plus where do they go (to which topic).
@thijsc any reason for the statistics_callback
to be global? What if I would want to have different callback handling in various consumers/producers?
@thijsc any reason for the statistics_callback to be global? What if I would want to have different callback handling in various consumers/producers?
I'm trying to get this done, but not making a lot of progress because I don't have a clear picture in my mind what this looks like. I can see how events for assignment changes and so forth can work.
I can also see how emitting an event for producing a message could work. I don't see how emitting an event for a delivered message would be useful. AS notifications assumes that things happen in sync, that's not going to be the case here. I think you're going to get a lot of out of order events.
I also don't see how we can do hooks for message delivery. The C lib pops them of a buffer, so when they arrive on the Ruby side says little on how the network is doing for example. The stats in the statistics callback do tell us that. Maybe I'm missing a useful use case here?
I think we need to spend some time coming up with a spec of which events should be emitted and write up some use cases on how one would benefit from them. That'll make it a more manageable project to get this done.
@JHK and @mensfeld which events do you think should be emitted and could you write up a short description of when they would trigger and which information they would emit?
I cannot say what exactly needs to be in such a message, but rather have a look at what racecar already provides:
Those are instrumentations built from the need to measure details within racecar. The statistics callback already provides a lot of those infos, but not the hook itself. So I'd suggest to include what makes sense to you in that hook. If one needs more, then we can still extend using individual PRs. But the general idea of hooks is present by then and the parameters can then be discussed on a case by case basis.
We have a pretty clear need to measure then number of successful / failed message deliveries per producer process.
@dasch but you can do that yourself now: https://github.com/karafka/waterdrop/pull/106/files#diff-d179c7dee2064c1622d2d3da2b03c44dR32
Thanks all for the input! I'm going to work on it.
Hello! I'm curious what became of this work. We're currently going through the process of updating Racecar and we've been leveraging the consumer heartbeat instrumentation for monitoring our consumer health. Are there any plans to implement something similar? If not we would love to see it!
@emersonpriceiv the current API allows you to do that. Please see the PR above for waterdrop where there's a full instrumentation support.
Closing this one. I think it's not clear how we can improve on rdkafka's internal capabilities.
To do some deeper introspection on what is going on when receiving or publishing messages it would be useful to have an instrumentation interface compatible to Active Support Instrumentation, default might be just a
NullInstrumenter
which is just discarding information. To have an idea what might be actually useful to instrument be inspired byruby-kafka
: