open-telemetry / opentelemetry-proto

OpenTelemetry protocol (OTLP) specification and Protobuf definitions
https://opentelemetry.io/docs/specs/otlp/
Apache License 2.0
545 stars 242 forks source link

Document compatibility requirements for profiles #559

Closed tigrannajaryan closed 3 weeks ago

tigrannajaryan commented 1 month ago

Profiles maintainers (@open-telemetry/profiling-maintainers), please take a look at the pprof compatibility criteria added to CONTRIBUTING.md.

tigrannajaryan commented 1 month ago

CC @aalexand

tigrannajaryan commented 1 month ago

I marked this draft for now. We should discuss the options.

tigrannajaryan commented 1 month ago

@open-telemetry/profiling-maintainers I think we need to place a short moratorium on merging changes to profiles until we have a resolution on this PR otherwise we risk making changes that we may need to undo.

If you have any thoughts on alternate definitions of "compatibility" that you would prefer instead please comment and suggest. I do not want to block other PRs for too long, so ideally we should use the time we have until the next SIG meeting for offline discussion.

cc @jsuereth

tigrannajaryan commented 1 month ago

The PR does not explain why I am suggesting to have the compatibility criteria and I think it is important to capture it. This has been discussed verbally but I don't think it is written down anywhere, so here is a summary of what I am aware of.

The Profiling SIG made a decision to derive Otel profiling format from pprof. As far as I remember the primary reason for this decision was to avoid creating yet another new standard and instead to reuse an existing one.

We discussed with the pprof team the option to extend pprof format in a way that is compatible with existing pprof format and can be adopted by pprof tooling in the future. This will ensure that we do not create one more format but instead evolve an existing one.

For this to be achievable obviously the changes we make need to be "compatible", and what exactly that compatibility means needs to be defined precisely so that we can follow that definition when introducing changes.

If we don't define these compatibility requirements and follow them then what we will likely end up with is an Otel profile format that was derived from pprof, but because of changes became incompatible with it and essentially is a fork of pprof, is yet another standard, so we will fail the original goal.

If we didn't have this goal we could have instead decided that we want to start from scratch and create a complete new Otel profiling format. This would be a very different design approach. It would be less restricting and potentially would allow Otel to design a more efficient format due to lifting off this restriction. However, we chose not to do this since we felt that the benefit of having one less standard outweighs the potential benefits that are possible from a greenfield design.

I strongly believe we must continue looking for ways to remain pprof-compatible.

That said, I recognize that there is more than one way to define the compatibility requirements. The version I present in this PR is one of the more restrictive versions. I can envision alternate approaches that are less restrictive. We would like to discuss these different approaches in the next Profiling SIG meeting.

tigrannajaryan commented 3 weeks ago

Closing for now, issue is recorded to figure this out: https://github.com/open-telemetry/opentelemetry-proto/issues/567