twmb / franz-go

franz-go contains a feature complete, pure Go library for interacting with Kafka from 0.8.0 through 3.8+. Producing, consuming, transacting, administrating, etc.
BSD 3-Clause "New" or "Revised" License
1.86k stars 191 forks source link

Request Errors metric support #830

Open asg0451 opened 2 months ago

asg0451 commented 2 months ago

I'm trying to implement support for a metric counting errors coming from kafka. kgo's metric hooks would let me count broker connect errors and network bytes read & write errors. However, I don't see a way to get a count of all errors occurring.

My motivation is I would like to be able to answer the question "is the kafka cluster I'm trying to produce to struggling/misconfigured/generally not behaving?" - and I don't believe that the currently exposed errors would do that justice. Am I correct in this thinking?

Proposal: maybe add another hook exposing RequestErrors, and call it here

twmb commented 1 month ago

To confirm: you'd like a way to inspect the ErrorCode field within all responses?

asg0451 commented 1 month ago

That would be great, yeah

twmb commented 3 weeks ago

What would this API look like? One option is to just give you the raw message, but then you would need to know where the error fields are, and sometimes error fields change locations as the API evolves in Kafka. As well, some messages have multiple error fields in different locations.

asg0451 commented 3 weeks ago

I didn't know that, that's annoying. Glad you handle that complexity for us. If you provide a hook that only gets called on failed messages, that would fit this use case even though it's an inelegant solution. Here I just care about diagnosing whether poor conditions upstream of this are caused by instability in the kafka cluster.

twmb commented 2 weeks ago

Do you have a proposal for the hook? With failed messages, I'd think you could use the existing promise error field in the callback, so I'm not sure what a request here could be.