`redis.exception.WatchError` is not really an error

chrisguidry commented 4 days ago

What problem do you want to solve?

When upgrading opentelemetry-instrumentation-redis from 0.39b0 to 0.46b0, we picked up the (very nice) fixes to instrumentation of the async redis client. However, all of our arq (https://github.com/samuelcolvin/arq/) workers started marking tons of traces as errors because they got a WatchError exception.

The redis WatchError is not really an error, it's a flow-control and concurrency-management mechanism (kind of like a LockError, etc). The fact that we're marking all of those spans as errored introduces a lot of noise to something that's a totally reasonable pattern and use of redis.

Here's an example of how it's used for concurrency control: https://github.com/samuelcolvin/arq/blob/main/arq/connections.py#L178C20-L180

Describe the solution you'd like

I'd recommend that we adjust the instrumentation to not mark spans as failed when they encounter WatchError (or to at least make this behavior configurable). It does make sense to still mark them as events on the current span, though.

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

None

xrmx commented 4 days ago

Yeah, makes sense.

Charlie-lizhihan commented 2 days ago

May I take this one?

xrmx commented 2 days ago

@Charlie-lizhihan sure

open-telemetry / opentelemetry-python-contrib