Closed Kavindu-Dodan closed 1 year ago
@thisthat @toddbaert @beeme1mr appreciate feedback on this proposal
Great summary @Kavindu-Dodan - the proposal makes a lot of sense. However, OTel suggests adding count
to the namespace for pluralization, e.g., feature_flag.flagd.impression.count
. This would require also to adapt the hooks tho 🤔
Hi! Here are a few thoughts, mainly looking at the spec you linked above:
featureflag.
over feature_flag.
, but I couldn't find any guidance in the Otel spec. All the other top-level namespaces are a single word (or even abbreviations, like db.
).http.server.response.size
. Maybe moving to feature_flag.flagd.http.response.size
would be more in line with existing semantic conventions..count
for UpDownCounters. Not sure if the counters in question are UpDownCounter
s or "just up" Counter
s, but it might be worth following that guidance there. The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name. The whole point of this discussion was Prometheus compatibility, if you add .count
out of the box you might end up with _count_total
for the Prometheus exporter, which is... not ideal.Thanks @pirgeo! I would prefer to use feature_flag
as a prefix to match what has been defined in the semantic convention for traces but I agree with the other points.
Sounds good! I neither found any guidance on the spec nor do I have a strong opinion on this one.
Apart from what @pirgeo already mentioned, here are my suggestions:
About the units: It's a bit hidden but there's some extra guidance here: https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units
feature_flag.flagd.http_request_time
: Should be s
feature_flag.flagd.impression_count
and feature_flag.flagd.evaluation_reason_count
: I think those are under the "default" unit 1
:Instruments for utilization metrics (that measure the fraction out of a total) are dimensionless and SHOULD use the default unit 1 (the unity).
If you want to capture more meaning, you could use {impression}
and {reason}
as stated in the spec:
Instruments that measure an integer count of something SHOULD only use annotations with curly braces to give additional meaning without the leading default unit (1). For example, use {packet}, {error}, {fault}, etc.
feature_flag.flagd.http_request_time
Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.
@thisthat @pirgeo @beeme1mr & @joaopgrassi for the feedback Thank you very much for taking the time to give your opinions.
The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name
Thank you for this input
Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.
Should be an oversight and the original description is meant to be changed.
If you want to capture more meaning, you could use {impression} and {reason} as stated in the spec:
Good point. I assume this allows us to remove the count
suffix
To summarise, can we agree on the following updated proposal?
A new addition to the common rule,
.
) notation instead of underscore (_
) And updated metrics chart,
Old | New | Unit |
---|---|---|
http_request_duration_seconds | feature_flag.flagd.http.request.duration | s |
http_response_size_bytes | feature_flag.flagd.http.response.size | By |
impressions | feature_flag.flagd.impression | {impression} |
reasons | feature_flag.flagd.evaluation.reason | {reason} |
Let me know your thoughts and if we agree, I will update the issue description and continue with the implementation.
Are the metrics feature_flag.flagd.http.request.duration
and feature_flag.flagd.http.response.size
only captured during flag evaluation? If so, then this looks good to me.
Just for clarification: Do feature_flag.flagd.http.request.duration
and feature_flag.flagd.http.response.size
refer to the requests that flagd sends out in order to get feature flag settings?
Will the feature_flag.flagd.evaluation.reason
have an attribute that states the actual reason? Like: feature_flag.flagd.evaluation.reason
with an attribute reason=user-changed-flag
or something? In that case, I wonder if dropping the reason
from the metric name would be a good idea, since you are basically counting evaluations, and can then split by reason. With how it is currently, you are counting reasons. This might be intended, but I know too little about the underlying data. If the reason is something that is highly changeable and for a user to set, keeping it as is would be good. If there is a predefined set of valid reasons, going with the dimension might be a better idea (but again: take it with a pinch of salt, my feature flag knowledge is shallow).
Finally: What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.
feature_flag.flagd.impression
-> I think this can now be pluralized feature_flag.flagd.impressions
, since you have the unit {impression}
From the spec:
Metric names SHOULD NOT be pluralized, unless the value being recorded represents discrete instances of a countable quantity. Generally, the name SHOULD be pluralized only if the unit of the metric in question is a non-unit (like {fault} or {operation}).
Same for the other (depending on how you address @pirgeo comment above)
About the request.duration
and response.size
I'm not sure if we need to redefine them under the feature_flag
namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.
Just for clarification: Do feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size refer to the requests that flagd sends out in order to get feature flag settings?
No, this correlates to the flag evaluation time of flagd & the size of the payload of the evaluation
And regarding reason
vs impression
, I think they have value on their own. impression
represents successful falg evaluations where reason
represent anything (both success or error) situations. Besides, OpenFeature define a set of reasons to correlate with this metric [1].
What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.
Yes, this is correct. http dureation & response size use histograms and reason & impression use counters
feature_flag.flagd.impression -> I think this can now be pluralized feature_flag.flagd.impressions, since you have the unit {impression} From the spec:
Thank you, I will update my proposal
About the request.duration and response.size I'm not sure if we need to redefine them under the feature_flag namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.
Good suggesion, are you proposing to use flagd
namespace and use something like flagd.http.server.duration
? If this is semconv compliant, I think we can choose this naming convension.
[1] - https://github.com/open-feature/spec/blob/main/specification/types.md#resolution-details
Good suggesion, are you proposing to use flagd namespace and use something like flagd.http.server.duration ? If this is semconv compliant, I think we can choose this naming convension.
Ah no, I'm proposing that Flagd re-uses the HTTP metrics defined in https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/http-metrics.md#semantic-conventions-for-http-metrics.
Not sure if that would be client
or server
but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration
and http.server.response.size
. That is what all other http servers will be reporting in their own implementations. For ex, web frameworks (Spring, ASP.NET, Gin, Laravel, etc) would report these.
Essentially then you end up with only flagd specifc metrics such as impressions
and reasons
Not sure if that would be client or server but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration and http.server.response.size
Wouldn't this create metric name conflicts at OTel backend (ex:- Prometheus) 🤔 Or do they differentiate metrics based on service naming and other attributes ?
When converting from OTLP to Prometheus, the spec offers guidance that should help in this case. Instrumentation scope and version become labels while resource attributes (e.g., service.name
) become this target_info metric
So it should work just as fine - they will split by say instrumentation scope = flagd and get the metrics they are after. @pirgeo maybe you have more input here, in case I missed something?
For Dynatrace for example, instrumentation scope+version and a few resource attributes (like service.name
) are also added automatically to all ingested metrics as dimensions.
@joaopgrassi thank you for the explanation. I think my doubt is clear and I learned a few more OTel details thanks to this discussion.
For the reference of others, Instrumentation Scope
[1] is already set to flagd
and this is done at the meter setup phase [2]. We can further add the version as part of this improvement.
With this agreement and understanding, let's agree on the following metric naming & unit changes,
flagd
Version: current flagd version
Old | New | Unit |
---|---|---|
http_request_duration_seconds | http.server.duration | s |
http_response_size_bytes | http.server.response.size | By |
impressions | feature_flag.flagd.impression | {impression} |
reasons | feature_flag.flagd.evaluation.reason | {reason} |
[1] - https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope [2] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go#L126
Requirements
Background
flagd comes with out of the box metrics support [1]. It currently expose the following metric names,
While these metrics serve a good purpose, the names do not comply with semantic conventions. For comparison, see how metrics are named for the metrics hook [2] (ex:- feature_flag.evaluation_requests_total, feature_flag.evaluation_success_total)
Proposal (updated 27/06/2023)
This issue focuses on renaming flagd metrics to comply with semantic conventions.
Common rules
flagd
feature_flag
prefix so that monitoring systems can aggregate feature flag related metricsflagd
phrase so that it is clear the metric belongs to flagd systemWith this background, consider the following proposals,
[1] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go [2] - https://github.com/open-feature/go-sdk-contrib/blob/main/hooks/open-telemetry/README.md#metric-hook [3] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverduration [4] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverrequestsize [5] - https://github.com/open-telemetry/semantic-conventions/tree/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions#pluralization