open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.68k stars 883 forks source link

Semantic Convention for Feature Flagging #2532

Closed beeme1mr closed 1 year ago

beeme1mr commented 2 years ago

Proposal

I would like to propose that OTel defines a semantic convention for feature flagging. This will provide the ability to split traces and metrics by the attributes for use cases like A/B testing, feature rollouts, and similar (as mentioned by @dyladan here).

There was an active discussion on this topic here but I'll summarized the findings below:

Attribute Type Description Examples Required
feature_flag.flag_key string The unique identifier of the feature flag. show-new-logo Yes
feature_flag.provider_name string The name of the provider performing the flag evaluation. Flag Manager No
feature_flag.evaluated_variant string The name associated with the evaluated value. [1] reverse See below
feature_flag.evaluated_value string A string representation of the evaluated value. [2] true See below

[1]: A variant should be used if it is available. If the variant is present, feature_flag.evaluated_value should be omitted.

[2]: The value should only be used if the variant is not available. How the value is represented as a string should be determined by the implementer.

Precedents

There are a few examples available on GitHub that shows how this could be done as a plugin or using the OTel API. These were used as inspiration for the proposal above.

Example in Zipkin

In this example, the flag value is the Fibonacci algorithm that runs. In this case, it's recursive, which ends up having a major impact on performance.

image

dyladan commented 2 years ago

I've participated in other discussions around this but for visibility's sake i'll put my πŸ‘ here. It is worth mentioning that you should think about #2522 here since I believe it will merge soon. I would suggest provider name to be an opt-in attribute as I believe most services will only use a single feature flag provider. Management URL is less obvious if it should be opt-in or opt-out. For most services the identifier will be enough to find the flag in their system so this is really just a convenience feature so I would lean to opt-in.

justinabrahms commented 2 years ago

Just a πŸ‘πŸ» from the eBay side.

weyert commented 2 years ago

Looks good but I think it would be worth while to also include the distinct user identifier the flag was evaluated against

toddbaert commented 2 years ago

Looks good but I think it would be worth while to also include the distinct identifier the flag was evaluated against

Do you mean an identifier for the subject of flag evaluation? As in the end-user or machine client?

If so, I think that might be problematic, as often PII is used for such things (as much as we'd recommend people use an opaque UUID or hash for such purposes).

weyert commented 2 years ago

Yes, as sometimes you might check for a ops flag, or tenant level flag or an authenticated user level and then it would be useful for which user/target audience you checked without needing to dive into the code to find out.

I think it could be a optional attribute that can be specced out but not required to be part of a span

weyert commented 2 years ago

Beside of that I am happy to get this semantic convention for feature flags merged :)

beeme1mr commented 2 years ago

Hi @jmacd, do you have any questions or concerns regarding this proposal? It's worth mentioning that the official OTel demo app is planning on using this proposal. You can see the issue here.

weyert commented 2 years ago

FWIW. I am using these attributes in a preproduction environment at work as of today. Would be great if they could get accepted.

tigrannajaryan commented 2 years ago

We discussed this in the Spec SG meeting today.

We believe that we need more information about this issue to understand if it belongs to Otel spec, in particular we would like to know how much interest there is in the industry and in Otel community to have these semantic conventions in Otel.

So, if you are interested in this or know others who are interested please either +1 the issue or comment on it if you want to add something.

justinabrahms commented 2 years ago

To add more info to my generic +1 from above, eBay is pursuing the implementation of the Open feature spec and plan to integrate with otel as well for traces. The information above is important for us and we'll have to do it either way. It's just a question if we can do this in a way that is alignment w the broader industry or going off on our own.

beeme1mr commented 2 years ago

Dynatrace is also planning on supporting feature flag semantics. It would be nice to have them officially defined in the OTel spec.

toddbaert commented 2 years ago

@tigrannajaryan

The conventions suggested here map directly to terminology defined in the draft OpenFeature specification, which represents an industry consensus. We've had broad support from end-users and vendors. We have a doc specifically dedicated to enumerating all the interested leaders in this area: https://github.com/open-feature/community/blob/main/interested-parties.md. At this point most of the vendors represented here have either contributed to our specification itself or one of our language SDKs. If this doesn't constitute a consensus in this space, please let us know what else might convince you we have one.

Additionally, we've just been accepted as a CNCF sandbox project.

I'm certainly willing to entertain an discussion on the particular conventions suggested above, but up to this point the problems seems to be a lack of confidence in industry consensus or a need for these conventions. I hope my points above help in that regard.

tigrannajaryan commented 2 years ago

@toddbaert thanks for the info and the link, this is very helpful. I believe we have sufficient evidence that these semantic conventions are wanted.

@open-telemetry/specs-approvers I am marking this as triage-accepted so that further discussion can happen on the PR that is already open. If you think this needs to be triaged differently please comment otherwise please review the PR.

beeme1mr commented 2 years ago

Thanks @tigrannajaryan, I'll close this issue and update the PR.

tigrannajaryan commented 2 years ago

Let's keep the issue open until the PR that resolves it is merged.

hannahchan commented 1 year ago

Hi all,

Sorry I'm late to the discussion. I'll still catching up on what's happened so forgive me if I've misunderstood something. If this discussion is still going on, how can I get involved?

I've been thinking about OpenTelemetry and feature management a lot lately and recently built a library for fun after some inspiration from what I've seen at work. I was also thinking about defining some semantic conventions for feature management around metrics and traces.

The library I wrote is located here, https://github.com/hannahchan/FeatureGates.NET and its aim is to get features to generate and emit RED (Rates, Errors and Duration) metrics so that reusable dashboards and other tooling can be built around feature management.

In my library, I use the term Feature Gate deliberately to distinguish the difference between a Feature Flag. A feature flag may control multiple feature gates. Feature flags are evaluated by a Feature Manager such as a LaunchDarkly client.

From what I've seen in this conversation and in the PR, we're talking about span attributes that are emitted by the Feature Manager? As in the client that connects to the Feature Management service?

I'm wondering if I need to modify my library, add to this spec or create a entirely different one for Feature Gates?