open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 887 forks source link

Capture policy regarding non-exposure of generated Protobuf structs in the end user API #3420

Closed tigrannajaryan closed 2 months ago

tigrannajaryan commented 1 year ago

This is related to https://github.com/open-telemetry/opentelemetry-go/issues/3455 which requests Go SDK to stop exposing the generated code in the API.

We don't have a policy or clarification anywhere that explains why this is prohibited.

My initial thoughts is that allowing has this benefit:

Downside:

@bogdandrutu I would like to hear your thoughts on this topic since you have a strong opinion about this.

bogdandrutu commented 1 year ago

Another thing is that adding new methods on a service will generate backwards incompatible code for languages like golang where that means adding a new func to an interface.

tigrannajaryan commented 1 year ago

Another thing is that adding new methods on a service will generate backwards incompatible code for languages like golang where that means adding a new func to an interface.

I think this is an argument in favour of prohibiting addition of new methods to the services since it breaks existing implementations. I think this prohibition is necessary regardless of what we decide about this non-exposure policy. I think it can be discussed separately.

tigrannajaryan commented 1 year ago

@open-telemetry/specs-approvers please comment.

yurishkuro commented 1 year ago

It is convenient and avoids the creation of a wrappers that often 1 to 1 mirror the Protobuf structs.

Not sure I buy this argument. Do we really have such complex data structures exposed via APIs where this duplication matters?

Another strike against proto is it bloats the library right at the API level. What if your company does not use protobuf at all? Why should that be a dependency just to use the API?

tigrannajaryan commented 1 year ago

Not sure I buy this argument. Do we really have such complex data structures exposed via APIs where this duplication matters?

Yes, we do. You potentially can expose the entire Protobuf definiiton. This is exactly what Collector codebase does. It wraps all of OTLP Protobuf definitions in a set of new data structures called pdata. pdata is semantically equivalent to OTLP, but exposes it in a way that is more ergonomic to use compared to bare Protobuf-generated set of structs. This is complex enough that we don't do it manually since it is error prone. We created a special generator inside the Collector codebase which generates the wrappers.

My argument that doing this (wrapping) should be a choice and not a forced policy since it is a non-trivial amount of work.

Another strike against proto is it bloats the library right at the API level. What if your company does not use protobuf at all? Why should that be a dependency just to use the API?

This is a good argument.

tigrannajaryan commented 1 year ago

Another strike against proto is it bloats the library right at the API level. What if your company does not use protobuf at all? Why should that be a dependency just to use the API?

I added this to the Downsides list in the description.

Aneurysm9 commented 1 year ago

Another strike against proto is it bloats the library right at the API level. What if your company does not use protobuf at all? Why should that be a dependency just to use the API?

I'm not sure this is accurate. I'm not aware of any place in the API that would take a dependency on generated protobuf types. This would relate to the SDK and likely only the OTLP exporter.

yurishkuro commented 1 year ago

@Aneurysm9 I'd argue even at the general SDK level this is undesirable. If we have a good SDK design, it should not force people to re-implement the API from scratch just because they want a custom exporter. But I agree that at some point this may no longer apply, and the specific complaint 1 was about OTLP exporter, which is always coupled w/ protobuf.

pellared commented 1 year ago

Another downside of exposing proto-generated-types without wrapping those is that it couples to the:

  1. Protobuf library

For instance in Go there are two popular libraries (packages): github.com/golang/protobuf/proto which is deprecated and google.golang.org/protobuf... but who knows if there won't be v2 in future? A breaking change in the library may be also a breaking change of the OTel library (indirect/transitive API breaking change).

  1. Protocol Buffer Compiler

While most people is the most popular protoc plugins for generating code, nothing prevents them from creating their own plugins to generate code. E.g. https://rotemtam.com/2021/03/22/creating-a-protoc-plugin-to-gen-go-code/ Also I am not sure if the plugins have strict stability guarantees. For instance check https://github.com/golang/protobuf#compatibility:

This module and the generated code are expected to be stable over time. However, we reserve the right to make breaking changes [...] Any breaking changes outside of these will be announced 6 months in advance to protobuf@googlegroups.com

EDIT: Also I now noticed https://github.com/open-telemetry/opentelemetry-proto/blob/main/README.md#stability-definition

As rule of thumb, I think that libraries should avoid accepting and returning types that are not "under control" (of course there are exceptions like standard library types, and... other exceptions 😄). I always felt that it is a "general guideline".

jsuereth commented 1 year ago

I know you've already heard this:

I think the primary issue here is protocol buffer library generation DOES NOT guarantee stability on changes where the protocol buffer wire format would have been stable.

To @tigrannajaryan's point, pdata is an investment in a "stable" API that can serialize to proto format. Java, e.g. did the same thing, and has its own codegen that basically creates constants usable for writing serializers. (There's actually a couple reasons to custom-serialize OTLP, particularly important is not having to shuffle your data in memory to create the Resource->Scope->Signal hierarchy).

Personally, I'd be ok asking for all SiGs to avoid exposing proto generated classes for two reasons:

  1. We know of the compatibility issue.
  2. I believe the fundamental serialization optimisation is applicable across SiGs, not just a Java or Collector specific thing.

However, I agree with @tigrannajaryan that forcing this seems a bit too much.

MadVikingGod commented 1 year ago

This is unstated, so please correct me if I'm wrong, but this seems to be trying to prevent issues with the proto changing under an exporter. It does not look like any SIG has exposed the proto in the SDK but instead has some internal format that is used. The API that exposes this in go is contained within the OTLP exporter, which is going to have a dependency on the proto in some way, shape, or form, so we will not be able to prevent the inclusion of protobufs.

If that is the goal of this request, I don't think it would be enough not to have the protobuf in the public API. One way to prevent what happened is to prevent the exporter from using a different version of the proto than what it was written for. This would essentially require that the exporter can't depend on an external generation of the proto (e.g. go.opentelemetry.io/proto in go or opentelemetry-proto in python) and must be generated internally and vendored. This is because no matter what we do as developers to try and prevent users from mixing versions, we can't stop them. So if the v1.4 exporter is incompatible with v1.6+ proto, users will find a way to use them together.

tigrannajaryan commented 1 year ago

@MrAlias I think this policy should only apply to the API package, i.e. to the API that end user programs are calling. There there is a good reason to avoid exposing the generated Protobufs and bringing the Protobuf library as a transitive dependency.

I think it is extremely important for the API package and Noop implementation to remain very lightweight, bring very few dependencies. Doing otherwise can be very harmful for adoption of "OpenTelemetry instrumentation as-a-first-class feature of all software" philosophy.

I think this same policy must apply to the interface between the API package and SDK package otherwise we loose the long-term plug-ability of the SDK. A breaking change in the generated proto will result in a breaking change in this interface and will make impossible to upgrade to newer versions of the SDK. I think this plug-ability is also an important feature of OpenTelemetry that we must maintain.

Everything else? I don't think we need the same hard restriction. The SDK's interface that is supposed to be callable by end user for configurability for example doesn't need such restriction. Even if it breaks (very rarely), it is ok to require the user to fix their code. I also do not think this restriction should apply to other helper libraries (e.g. OpAMP Go).

MrAlias commented 1 year ago

@MrAlias I think this policy should only apply to the API package, i.e. to the API that end user programs are calling. There there is a good reason to avoid exposing the generated Protobufs and bringing the Protobuf library as a transitive dependency.

I think it is extremely important for the API package and Noop implementation to remain very lightweight, bring very few dependencies. Doing otherwise can be very harmful for adoption of "OpenTelemetry instrumentation as-a-first-class feature of all software" philosophy.

I think this same policy must apply to the interface between the API package and SDK package otherwise we loose the long-term plug-ability of the SDK. A breaking change in the generated proto will result in a breaking change in this interface and will make impossible to upgrade to newer versions of the SDK. I think this plug-ability is also an important feature of OpenTelemetry that we must maintain.

Everything else? I don't think we need the same hard restriction. The SDK's interface that is supposed to be callable by end user for configurability for example doesn't need such restriction. Even if it breaks (very rarely), it is ok to require the user to fix their code. I also do not think this restriction should apply to other helper libraries (e.g. OpAMP Go).

The only place the Go implementation exposes this dependency is in the trace otlp exporter in an exported (but only internally used) interface:

https://github.com/open-telemetry/opentelemetry-go/blob/289a612e6a40a7575cd13dbc02ae07f6269e5491/exporters/otlp/otlptrace/clients.go#L51

We have not exposed this for the metric otlp exporter, and do not plan to. I don't know of any plans to change the trace package as it is already released as stable. It also seems to be "forward compatible" base on the "everything else" clause you mention.

That said, we could deprecate the interface if we feel it necessary.

tigrannajaryan commented 1 year ago

FYI all and @open-telemetry/specs-approvers, OpAMP Workgroup chose to guarantee generated code compatibility for its 1.0, primarily because there is no separation of API and SDK there and OpAMP Workgroup believes hiding Protobuf-generated code in OpAMP implementation is not necessary (and would create a ton of unnecessary work to add wrappers).

This is just an FYI. I don't think OpAMP decision should influence our decision for Otel API libraries, where we have a different situation (most notably the API that needs to be a lightweight noop).

tigrannajaryan commented 1 year ago

I think this issue belongs to the spec repo, since it is a requirement for Otel API, not for OTLP or for protos. Moving.

dyladan commented 4 months ago

@tigrannajaryan this was fixed in go and I don't think it's affecting any other languages. Do you think this is still needed?

tigrannajaryan commented 4 months ago

I think we can close it.