[FEATURE] Rename metrics name to comply with feature flag semconv

Kavindu-Dodan commented 1 year ago

Requirements

Background

flagd comes with out of the box metrics support [1]. It currently expose the following metric names,

http_request_duration_seconds
http_response_size_bytes
impressions
reasons

While these metrics serve a good purpose, the names do not comply with semantic conventions. For comparison, see how metrics are named for the metrics hook [2] (ex:- feature_flag.evaluation_requests_total, feature_flag.evaluation_success_total)

Proposal (updated 27/06/2023)

This issue focuses on renaming flagd metrics to comply with semantic conventions.

Common rules

Instrumentation Scope: flagd
Version: current flagd version
Feature flag specific metric name must start with feature_flag prefix so that monitoring systems can aggregate feature flag related metrics
Feature flag specific metric name must contain flagd phrase so that it is clear the metric belongs to flagd system

With this background, consider the following proposals,

Old	New	Unit	Refer
http_request_duration_seconds	http.server.duration	S	[3]
http_response_size_bytes	http.server.response.size	By	[4]
impressions	feature_flag.flagd.impression	{impression}	[5]
reasons	feature_flag.flagd.evaluation.reason	{reason}	[5]

[1] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go [2] - https://github.com/open-feature/go-sdk-contrib/blob/main/hooks/open-telemetry/README.md#metric-hook [3] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverduration [4] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverrequestsize [5] - https://github.com/open-telemetry/semantic-conventions/tree/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions#pluralization

Kavindu-Dodan commented 1 year ago

@thisthat @toddbaert @beeme1mr appreciate feedback on this proposal

thisthat commented 1 year ago

Great summary @Kavindu-Dodan - the proposal makes a lot of sense. However, OTel suggests adding count to the namespace for pluralization, e.g., feature_flag.flagd.impression.count. This would require also to adapt the hooks tho 🤔

pirgeo commented 1 year ago

Hi! Here are a few thoughts, mainly looking at the spec you linked above:

This is not a "hard" requirement, but personal taste: I think I prefer featureflag. over feature_flag., but I couldn't find any guidance in the Otel spec. All the other top-level namespaces are a single word (or even abbreviations, like db.).
For HTTP, the semantic conventions use the dot instead of the underscore, e.g., http.server.response.size. Maybe moving to feature_flag.flagd.http.response.size would be more in line with existing semantic conventions.
Here is a PR that intends to only add .count for UpDownCounters. Not sure if the counters in question are UpDownCounters or "just up" Counters, but it might be worth following that guidance there. The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name. The whole point of this discussion was Prometheus compatibility, if you add .count out of the box you might end up with _count_total for the Prometheus exporter, which is... not ideal.

beeme1mr commented 1 year ago

Thanks @pirgeo! I would prefer to use feature_flag as a prefix to match what has been defined in the semantic convention for traces but I agree with the other points.

pirgeo commented 1 year ago

Sounds good! I neither found any guidance on the spec nor do I have a strong opinion on this one.

joaopgrassi commented 1 year ago

Apart from what @pirgeo already mentioned, here are my suggestions:

About the units: It's a bit hidden but there's some extra guidance here: https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units

feature_flag.flagd.http_request_time: Should be s
feature_flag.flagd.impression_count and feature_flag.flagd.evaluation_reason_count: I think those are under the "default" unit 1:

Instruments for utilization metrics (that measure the fraction out of a total) are dimensionless and SHOULD use the default unit 1 (the unity).

If you want to capture more meaning, you could use {impression} and {reason} as stated in the spec:

Instruments that measure an integer count of something SHOULD only use annotations with curly braces to give additional meaning without the leading default unit (1). For example, use {packet}, {error}, {fault}, etc.

feature_flag.flagd.http_request_time

Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.

Kavindu-Dodan commented 1 year ago

@thisthat @pirgeo @beeme1mr & @joaopgrassi for the feedback Thank you very much for taking the time to give your opinions.

The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name

Thank you for this input

Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.

Should be an oversight and the original description is meant to be changed.

If you want to capture more meaning, you could use {impression} and {reason} as stated in the spec:

Good point. I assume this allows us to remove the count suffix

To summarise, can we agree on the following updated proposal?

A new addition to the common rule,

Adopt dot(.) notation instead of underscore (_)

And updated metrics chart,

Old	New	Unit
http_request_duration_seconds	feature_flag.flagd.http.request.duration	s
http_response_size_bytes	feature_flag.flagd.http.response.size	By
impressions	feature_flag.flagd.impression	{impression}
reasons	feature_flag.flagd.evaluation.reason	{reason}

Let me know your thoughts and if we agree, I will update the issue description and continue with the implementation.

beeme1mr commented 1 year ago

Are the metrics feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size only captured during flag evaluation? If so, then this looks good to me.

pirgeo commented 1 year ago

Just for clarification: Do feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size refer to the requests that flagd sends out in order to get feature flag settings?

Will the feature_flag.flagd.evaluation.reason have an attribute that states the actual reason? Like: feature_flag.flagd.evaluation.reason with an attribute reason=user-changed-flag or something? In that case, I wonder if dropping the reason from the metric name would be a good idea, since you are basically counting evaluations, and can then split by reason. With how it is currently, you are counting reasons. This might be intended, but I know too little about the underlying data. If the reason is something that is highly changeable and for a user to set, keeping it as is would be good. If there is a predefined set of valid reasons, going with the dimension might be a better idea (but again: take it with a pinch of salt, my feature flag knowledge is shallow).

Finally: What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.

joaopgrassi commented 1 year ago

feature_flag.flagd.impression -> I think this can now be pluralized feature_flag.flagd.impressions, since you have the unit {impression} From the spec:

Metric names SHOULD NOT be pluralized, unless the value being recorded represents discrete instances of a countable quantity. Generally, the name SHOULD be pluralized only if the unit of the metric in question is a non-unit (like {fault} or {operation}).

Same for the other (depending on how you address @pirgeo comment above)

About the request.duration and response.size I'm not sure if we need to redefine them under the feature_flag namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.

Kavindu-Dodan commented 1 year ago

Just for clarification: Do feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size refer to the requests that flagd sends out in order to get feature flag settings?

No, this correlates to the flag evaluation time of flagd & the size of the payload of the evaluation

And regarding reason vs impression, I think they have value on their own. impression represents successful falg evaluations where reason represent anything (both success or error) situations. Besides, OpenFeature define a set of reasons to correlate with this metric [1].

What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.

Yes, this is correct. http dureation & response size use histograms and reason & impression use counters

feature_flag.flagd.impression -> I think this can now be pluralized feature_flag.flagd.impressions, since you have the unit {impression} From the spec:

Thank you, I will update my proposal

About the request.duration and response.size I'm not sure if we need to redefine them under the feature_flag namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.

Good suggesion, are you proposing to use flagd namespace and use something like flagd.http.server.duration ? If this is semconv compliant, I think we can choose this naming convension.

[1] - https://github.com/open-feature/spec/blob/main/specification/types.md#resolution-details

joaopgrassi commented 1 year ago

Good suggesion, are you proposing to use flagd namespace and use something like flagd.http.server.duration ? If this is semconv compliant, I think we can choose this naming convension.

Ah no, I'm proposing that Flagd re-uses the HTTP metrics defined in https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/http-metrics.md#semantic-conventions-for-http-metrics.

Not sure if that would be client or server but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration and http.server.response.size. That is what all other http servers will be reporting in their own implementations. For ex, web frameworks (Spring, ASP.NET, Gin, Laravel, etc) would report these.

Essentially then you end up with only flagd specifc metrics such as impressions and reasons

Kavindu-Dodan commented 1 year ago

Not sure if that would be client or server but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration and http.server.response.size

Wouldn't this create metric name conflicts at OTel backend (ex:- Prometheus) 🤔 Or do they differentiate metrics based on service naming and other attributes ?

joaopgrassi commented 1 year ago

When converting from OTLP to Prometheus, the spec offers guidance that should help in this case. Instrumentation scope and version become labels while resource attributes (e.g., service.name) become this target_info metric

So it should work just as fine - they will split by say instrumentation scope = flagd and get the metrics they are after. @pirgeo maybe you have more input here, in case I missed something?

For Dynatrace for example, instrumentation scope+version and a few resource attributes (like service.name) are also added automatically to all ingested metrics as dimensions.

Kavindu-Dodan commented 1 year ago

@joaopgrassi thank you for the explanation. I think my doubt is clear and I learned a few more OTel details thanks to this discussion.

For the reference of others, Instrumentation Scope [1] is already set to flagd and this is done at the meter setup phase [2]. We can further add the version as part of this improvement.

With this agreement and understanding, let's agree on the following metric naming & unit changes,

Instrumentation Scope: flagd

Version: current flagd version

Old	New	Unit
http_request_duration_seconds	http.server.duration	s
http_response_size_bytes	http.server.response.size	By
impressions	feature_flag.flagd.impression	{impression}
reasons	feature_flag.flagd.evaluation.reason	{reason}

[1] - https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope [2] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go#L126

open-feature / flagd

[FEATURE] Rename metrics name to comply with feature flag semconv #712

Requirements