Open austindrenski opened 6 months ago
I fully endorse moving the metrics, traces and logs into this repo. The consumers later have the chance to include or exclude the data that we collect. If this is accepted, I can work on moving it here.
1 - @open-feature/sdk-dotnet-approvers & @open-feature/sdk-dotnet-maintainers: if this makes sense, I'll abandon https://github.com/open-feature/dotnet-sdk-contrib/pull/122 and move the OTEL stuff here. Anyone against?
2 - What are the steps to deprecate the library in case of approval?
Hooks were introduced to OpenFeature to support use cases such as telemetry without bloating the SDK with dependencies that not everyone needs. In this case, Microsoft has added native support for OTel within the runtime, so dependency bloat isn't a concern. I think it's worth looking into.
In terms of the implementation, are you proposing that the hook is bundled with the rest of the SDK or would OTel be automatically configured within the SDK without requiring a hook at all? Bundling the hook seems like the most straightforward approach to me. The automatic approach is compelling but I think it would introduce additional challenges we would have to consider.
What do you think @askpt @austindrenski @toddbaert?
@beeme1mr wrote https://github.com/open-feature/dotnet-sdk/issues/175#issuecomment-1900634701:
In terms of the implementation, are you proposing that the hook is bundled with the rest of the SDK or would OTel be automatically configured within the SDK without requiring a hook at all? Bundling the hook seems like the most straightforward approach to me. The automatic approach is compelling, but I think it would introduce additional challenges we would have to consider.
When I opened this issue last week, I was just proposing the minimally invasive approach of lifting the hooks as-is into the SDK, but with another week of using the SDK in production, I'm finding the lack of complete telemetry from the SDK to be a real peeve.
I'm personally finding it challenging to answer questions about perf, overhead, etc., related to the impact of using the OpenFeature SDK on a range of luke-warn to scorchingly-hot paths.
So, I'm in favor of fully instrumenting the OpenFeature SDK without hooks:
System.Net.Http
have to resort to with separate instrumentation packages that require close coordination with the upstream library and hard-coding event names to try and listen for (in their case, for fair reasons, but we're in our infancy compared to them):
There's prior art for doing this without taking any direct dependencies from the MassTransit project (see: MassTransit/MassTransit@7e9ec245a37745e30c7ba16bf5c04c5f0c3554ea, open-telemetry/opentelemetry-dotnet-contrib#788), but while impressive that they were able to do this (and laudable from the as-few-dependencies-as-possible-please perspective), I would prefer that we just accept a direct dependency on OpenTelemetry.Api
.
My reasoning for this is that, given the dotnet/runtime
team's efforts to make System.Diagnostics.DiagnosticSource
first-class compatible with the OTel spec (+/- some legacy method names), the broader .NET ecosystem is seeing widespread adoption and integration with OTel, and IMO, there will be very few consumers of the OpenFeature SDK that will not already have taken their own dependency on OpenTelemtry.Api
elsewhere in their package graph.
I'm open to hearing quibbles with this^ opinion, but as stated elsewhere, I have confidence in the open-telemetry/opentelemetry-dotnet
team to maintain OpenTelemetry.Api
in line with its stated goals of being a minimal abstractions library with a sole dependency on System.Diagnostics.DiagnosticSource
.
The downside to going the MassTransit route is two-fold:
As a library consumers, I will always have to provide my own registration extension which will both hurt discoverability and diminish the first-time, works-out-of-the-box experience:
namespace OpenTelemetry.Trace;
/// <summary>
/// Provides extension methods to enable OpenFeature tracing instrumentation on a <see cref="TracerProviderBuilder"/>.
/// </summary>
public static class OpenFeatureTracerProviderBuilderExtensions
{
public static TracerProviderBuilder AddOpenFeatureInstrumentation(this TracerProviderBuilder builder)
=> builder.AddSource("OpenFeature");
}
namespace OpenTelemetry.Metrics;
/// <summary>
/// Provides extension methods to enable OpenFeature metrics instrumentation on a <see cref="MeterProviderBuilder"/>.
/// </summary>
public static class OpenFeatureMeterProviderBuilderExtensions
{
public static MeterProviderBuilder AddOpenFeatureInstrumentation(this MeterProviderBuilder builder)
=> builder.AddMeter("OpenFeature");
}
I'm not saying that either of these^ are outright blockers. The MassTransit folks have shown that its more than possible to do this in a way that produces excellent telemetry. But as a consumer of MassTransit, I often wish they would've just taken the dependency on OpenTelemetry.Api
so I didn't have to keep copy/pasta'ing these^ snippets into each new project (thus requiring me to take on the OpenTelemetry.Api
dependency anyways).
So, I'm open to hearing other opinions, but I'm just highly skeptical that avoiding OpenTelemetry.Api
(especially as a fellow CNCF project...) provides much real-world benefit beyond the theater of "reducing dependencies" which the consumer will end up with anyways.
(Note: none of what I've written here is in anyway advocating for a batteries-included, opinionated approach to adding dependencies. I just view OpenTelemetry.Api
as having a unique role in the future of the .NET ecosystem.)
tldr; my vote is to instrument the SDK directly without hooks (leaving hooks to contribs and users)
I'm personally finding it challenging to answer questions about perf, overhead, etc., related to the impact of using the OpenFeature SDK on a range of luke-warn to scorchingly-hot paths.
Could you elaborate on this a bit more? What would you like to capture that can't be collected by a hook? It's also worth noting that a provider can include hooks, so that could be a nice way to ensure consistency across implementations.
Just to be clear, I'm not opposed to including native OTel support. I actually think it may be a good idea since the runtime has such good support for it. I just want to make sure we consider the impact and alternatives before committing to the change.
@beeme1mr wrote https://github.com/open-feature/dotnet-sdk/issues/175#issuecomment-1904124472:
@austindrenski wrote https://github.com/open-feature/dotnet-sdk/issues/175#issuecomment-1900858323:
I'm personally finding it challenging to answer questions about perf, overhead, etc., related to the impact of using the OpenFeature SDK on a range of luke-warn to scorchingly-hot paths.
Could you elaborate on this a bit more? What would you like to capture that can't be collected by a hook? It's also worth noting that a provider can include hooks, so that could be a nice way to ensure consistency across implementations.
Just to be clear, I'm not opposed to including native OTel support. I actually think it may be a good idea since the runtime has such good support for it. I just want to make sure we consider the impact and alternatives before committing to the change.
[!NOTE]
Disclosure up-front:
There are a couple of camps out there w.r.t. how much trace telemetry is too much trace telemetry, but I tend to land toward the right-hand tail of it. My rough rule-of-thumb is
if you're tracing _every_ method, you've gone too far
, but anything short of that and I'm pretty open to hearing a good argument for it.If you take this too far (e.g. as I've done on occasion), you end up with a real-world app where a root trace might contain dozens of thousands of child spans.
Personally, I love having that much data (especially when the app is a background service, not serving direct user traffic, and I can live with a little tracing overhead in exchange for deep, exhaustive performance telemetry). But this has many, many downsides, from the sheer cost of collecting that much data, to the potential runtime overhead, to the fact that major telemetry SaaS providers (e.g. Datadog) have a tendency to crash when loading a trace with more than
N
spans.
Obviously, this^ would be going wayyyy overboard for a library like the OpenFeature SDK, and that's not what I'm suggesting. (Among other things, I'm only happy with that much telemetry when I'm in control of it, i.e., if a third-party library produced that much telemetry on its own, I'd be peeved.)
At the same time, in order to get to a place where I, as an app owner, can choose to collect an excessive amount of telemetry, I need the libraries that I consume to produce at least some traces of their own.
So my ideal outcome from this issue would be for the OpenFeature SDK to produce at least the following traces:
OpenFeatureClient.Get{*}Details()
OpenFeatureClient.Trigger*Hooks()
callsthe effect being something like this (where the number of traces we produce is 3 + the number of before hooks + the number of after hooks:
get_boolean_details |-------------------------------------|
trigger_before_hooks |--------|
trigger_before_hook |--|
trigger_before_hook |--|
trigger_before_hook |--|
evaluate |----------------|
trigger_after_hooks |--------|
trigger_after_hook |--|
trigger_after_hook |--|
trigger_after_hook |--|
and potentially a few more spans if we make them sufficiently opt-in (e.g. feature flags for feature flags telemetry, lol):
get_boolean_details |-------------------------------------|
+calculate_context |-|
+calculate_hooks |-|
trigger_before_hooks |--------|
trigger_before_hook |--|
trigger_before_hook |--|
trigger_before_hook |--|
evaluate |----------------|
trigger_after_hooks |--------|
trigger_after_hook |--|
trigger_after_hook |--|
trigger_after_hook |--|
What I'm looking to get out of this is to set consistent telemetry expectations for consumers of the OpenFeature SDK, irrespective of what an individual contrib package chooses to do on their own.
For example, I want to be able to add a new contrib hooks package, and if it causes perf regressions, I want the base OpenFeature SDK to be sufficiently instrumented that I can determine that even if the contrib package provides none of their own telemetry.
This example^ is where my idea about the trigger_{before,after}_hooks
and trigger_{before,after}_hook
spans come in:
If we handle starting those spans here in the SDK, then a hook provider (whether from the contrib repo, or a one-off implemented in the consumer app) can add tags, record exceptions, and set the status on the ambient activity (i.e. Activity.Current?.Set*()
) without the consumer app having to opt-into additional telemetry from each contrib package (i.e. .AddSource("OpenFeature")
gets you everything you want from an OpenFeatureClient
without knowing to also add .AddSource("OpenFeature.Contrib.Hooks.OTelHooks")
).
I'm not entirely sure this^ will read as convincing/easy-consensus as I think it will look once we put together some draft code.
Could you elaborate on this a bit more? What would you like to capture that can't be collected by a hook? It's also worth noting that a provider can include hooks, so that could be a nice way to ensure consistency across implementations.
I (hopefully) answered this above, but just to bottom-line it:
tldr; I'm proposing that the OpenFeature SDK should itself wrap hooks in telemetry spans so that consumers can observe the impact of:
I'm personally finding it challenging to answer questions about perf, overhead, etc., related to the impact of using the OpenFeature SDK on a range of luke-warn to scorchingly-hot paths.
Could you elaborate on this a bit more? What would you like to capture that can't be collected by a hook? It's also worth noting that a provider can include hooks, so that could be a nice way to ensure consistency across implementations.
Just to be clear, I'm not opposed to including native OTel support. I actually think it may be a good idea since the runtime has such good support for it. I just want to make sure we consider the impact and alternatives before committing to the change.
I have been exploring better the current otel-contrib implementation. There are a few points which I would like to discuss.
RecordException
that can be easily "moved" if we want to remove any external dependency. Even though I trust the Otel team. đŸ˜„ P.S.: I think the point 3 is the same as @austindrenski pointed out but he types way faster than me đŸ¤£
Hey @askpt @austindrenski, thanks for your input. I can see how that could be valuable in some situations, but it would likely be overkill most of the time. If we went that route, I would recommend not capturing at that granularity by default.
To provide a bit more context behind the current implementation, please check out the feedback we received when initially adding feature flag semantics to the OTel spec. The TL;DR is the OTel team felt that a feature flag evaluation should have a negligible impact on performance in most situations. In OpenFeature, some providers may perform a remote evaluation, but those calls can typically be captured using out-of-the-box instrumentation (e.g. HTTP, gRPC).
https://github.com/open-telemetry/opentelemetry-specification/pull/2529
Hey @askpt @austindrenski, thanks for your input. I can see how that could be valuable in some situations, but it would likely be overkill most of the time. If we went that route, I would recommend not capturing at that granularity by default.
Absolutely. At a minimum, any telemetry we build into the library at this layer will be gated behind the usual OTel opt-in, i.e., consumer apps will need to explicitly call .AddSource("OpenFeature")
(+/- a helper method) when configuring their app.
So by default, adding a package reference for OpenFeature
!= opting into any telemetry.
Taking it a step further, I described two groups (i.e. at least this
and also maybe these
) in my earlier post, but I'd support going even more granular, especially in a post-#181 world, e.g.:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.WithMetrics(static builder => builder
// exporters
.AddOtlpExporter()
// instrumentation
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddOpenFeature())
.WithTracing(static builder => builder
// exporters
.AddOtlpExporter()
// instrumentation
.AddAspNetCoreInstrumentation(static options => options.RecordException = true)
.AddHttpClientInstrumentation(static options => options.RecordException = true)
.AddOpenFeature(static options =>
{
options.IncludeContexts = true;
options.IncludeFlags = true;
options.IncludeHooks = true;
}));
I want granular telemetry, but I'm incredibly sensitive to my position being an outlier/very much related to the specific workloads I'm building/maintaining.
So very committed to finding the right balance of out-of-the-box telemetry support, without burdening consumers with unwanted noise.
To provide a bit more context behind the current implementation, please check out the feedback we received when initially adding feature flag semantics to the OTel spec. The TL;DR is the OTel team felt that a feature flag evaluation should have a negligible impact on performance in most situations. In OpenFeature, some providers may perform a remote evaluation, but those calls can typically be captured using out-of-the-box instrumentation (e.g. HTTP, gRPC).
That's very interesting to hear, will review the earlier discussion, thanks for the xref!
I think this is worth exploring a bit more. I really like the idea. The only downside I see is that a provider would no longer be able to ship with OpenTelemetry support pre-configured, unless I'm missing something. Perhaps there's an elegant way to support that use case too.
Requirements
Following open-feature/dotnet-sdk-contrib#132, the
OpenFeature.Contrib.Hooks.Otel
package now only depends onOpenFeature
andOpenTelemetry.Api
, the latter of which in turn has just a single dependency onSystem.Diagnostics.DiagnosticSource
.I'm proposing that we obsolete the
OpenFeature.Contrib.Hooks.Otel
package and incorporate its contained code into the mainOpenFeature
project to provide first-class telemetry support in line with other popular projects in the .NET ecosystem.Is this a good idea?
From
open-telemetry/opentelemetry-dotnet
(emphasis added):From
open-telemetry/opentelemetry-dotnet
(emphasis added):What if we think
OpenTelemetry.Api
is just too scary?I personally have a lot of faith in the
open-telemetry/opentelemetry-dotnet
team, and thus place a lot of trust in their promise to maintainOpenTelemetry.Api
as a lightweight abstractions package, but for good measure, I do want to call out that we could accomplish this proposed in-the-box'ing without taking a dependency onOpenTelemetry.Api
.To do this we would instead take a direct dependency on
System.Diagnostics.DiagnosticSource
, and then reimplement-the-wheel forActivityExtensions.RecordException(...)
,ActivityExtensions.SetStatus(...)
, etc.Again, I'm not endorsing this, but it's an option (e.g. among many other reasons, it would be exceptionally peevish if our impl drifts from the upstream impl and exceptions recorded by another library have different formats in Datadog/Dynatrace/etc).
See: open-feature/dotnet-sdk-contrib#132