open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
647 stars 535 forks source link

`redis.exception.WatchError` is not really an error #2639

Open chrisguidry opened 4 days ago

chrisguidry commented 4 days ago

What problem do you want to solve?

When upgrading opentelemetry-instrumentation-redis from 0.39b0 to 0.46b0, we picked up the (very nice) fixes to instrumentation of the async redis client. However, all of our arq (https://github.com/samuelcolvin/arq/) workers started marking tons of traces as errors because they got a WatchError exception.

The redis WatchError is not really an error, it's a flow-control and concurrency-management mechanism (kind of like a LockError, etc). The fact that we're marking all of those spans as errored introduces a lot of noise to something that's a totally reasonable pattern and use of redis.

Here's an example of how it's used for concurrency control: https://github.com/samuelcolvin/arq/blob/main/arq/connections.py#L178C20-L180

Describe the solution you'd like

I'd recommend that we adjust the instrumentation to not mark spans as failed when they encounter WatchError (or to at least make this behavior configurable). It does make sense to still mark them as events on the current span, though.

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

None

xrmx commented 4 days ago

Yeah, makes sense.

Charlie-lizhihan commented 2 days ago

May I take this one?

xrmx commented 2 days ago

@Charlie-lizhihan sure