Closed tedsuo closed 4 years ago
Overall looks good to me.
FYI I have updated this proposal based on feedback. Rather than adding additional status codes to represent user overrides, we add a user_override
boolean which indicated that the end user has confirmed that the status code is correct. The expected behavior is that the status code on the span should take precedence over any further analysis.
Please let me know if this solves our remaining issues with error reporting.
Also, we will be discussing this proposal at the Errors Working Group tomorrow (thursday) at 8am Pacific. Please join if you can make it.
Details: https://docs.google.com/document/d/1HgUI69rBridFzCXxXuTjQrkG6jb-YcFQnZjPcyBcK1U/
I beg your pardon. Do you I read the current proposal correctly, that all it does is adding new error_override
flag to Span? I don't see how this solves any problem we tried to address.
It does not address open-telemetry/opentelemetry-specification#706. It does not address motivational part of #134 . It does not address part of its own motivation:
However, there is confusion over the mapping of semantic conventions to status codes, and concern over the subjective nature of errors. Which network failures count as an error? Are 404s an error? The answer is often dependent on the situation, but without even a baseline of suggested status codes for each convention, the instrumentation author is placed under the heavy burden of making the decision. Worse, the decisions will not be in sync across different instrumentation packages.
I think the initial suggestion of this OTEP of reducing/renaming the enum of Span.Status
made sense. Current form does not.
Ok, third draft is now available. This draft splits the difference between the first and second draft.
Changes:
Thank you all who have been contributing feedback! I am hopeful that this is the final iteration.
@tedsuo please fix markdownlint errors.
Thanks @arminru. I adjusted the language to clarify the points you raised. Is this good enough to merge?
Merging this as we have enough votes and don't want to decrease the velocity - and as @arminru said, we will iron out the actual details in the Specification.
cc @bogdandrutu
@carlosalberto not that I am totally against this (I would probably request changes in that case) but as mentioned multiple times, we cannot use the velocity argument to merge changes before GA because that may cost us a lot in the future if we merge half baked changes now to rush things.
@bogdandrutu Oh, definitely. I merged cause this is 'merely' an OTEP, but let's definitely find full agreement when the actual changes hit the Specification (as long as they take).
@bogdandrutu: I agree, but to be clear, this was merged because it had more than enough votes, was open for an acceptable amount of time, and there were no further objections.
This proposal attempts to address error handling while making the least number of breaking changes to our current system. It retains SpanStatus. Other proposals we essentially reinventing it, only in a way that would introduce a number of breaking changes. It allows current instrumentation to continue to work as-is, while providing guidance to instrumentation authors. It includes a way to differentiate between the default errors provided by instrumentation and intentional error flagging provided by the end user.
If accepted, this proposal would replace OTEP #134, and addresses issues #559 and #706.