When upgrading opentelemetry-instrumentation-redis from 0.39b0 to 0.46b0, we picked up the (very nice) fixes to instrumentation of the async redis client. However, all of our arq (https://github.com/samuelcolvin/arq/) workers started marking tons of traces as errors because they got a WatchError exception.
The redisWatchError is not really an error, it's a flow-control and concurrency-management mechanism (kind of like a LockError, etc). The fact that we're marking all of those spans as errored introduces a lot of noise to something that's a totally reasonable pattern and use of redis.
I'd recommend that we adjust the instrumentation to not mark spans as failed when they encounter WatchError (or to at least make this behavior configurable). It does make sense to still mark them as events on the current span, though.
What problem do you want to solve?
When upgrading
opentelemetry-instrumentation-redis
from 0.39b0 to 0.46b0, we picked up the (very nice) fixes to instrumentation of the async redis client. However, all of ourarq
(https://github.com/samuelcolvin/arq/) workers started marking tons of traces as errors because they got aWatchError
exception.The
redis
WatchError
is not really an error, it's a flow-control and concurrency-management mechanism (kind of like aLockError
, etc). The fact that we're marking all of those spans as errored introduces a lot of noise to something that's a totally reasonable pattern and use of redis.Here's an example of how it's used for concurrency control: https://github.com/samuelcolvin/arq/blob/main/arq/connections.py#L178C20-L180
Describe the solution you'd like
I'd recommend that we adjust the instrumentation to not mark spans as failed when they encounter
WatchError
(or to at least make this behavior configurable). It does make sense to still mark them as events on the current span, though.Describe alternatives you've considered
No response
Additional Context
No response
Would you like to implement a fix?
None