Closed tigrannajaryan closed 3 months ago
@open-telemetry/profiling-maintainers has there been any progress on this topic?
I had this action item from last meeting:
(Alexey) Maintainers - work on which changes need to be reverted and which ones do not need to be reverted
I haven't looked into it yet, I was hoping to get more comments on the compatibility doc I prepared earlier in the meantime.
I'll probably spend some time preparing a prototype PR which undoes some of the incompatibility changes in pprofextended and see how other maintainers react to it.
I will not be able to attend today's profiling SIG meeting unfortunately - the time doesn't work out for me.
Logistics-wise FYI, I do not seem to be able to assign this issue to myself, no permission apparently.
Thanks @aalexand, I assigned it to you.
All, this issue is a blocker for new PRs to the profiling proto format. We need a decision and a compatibility definition in place before we can continue making changes to the proto.
CC @open-telemetry/profiling-maintainers
I have prepared a list of items where I see the major differences between OTel Profiling and pprof (protocol & tool):
Profile.mapping: "mapping[0] will be the main binary" In whole system profiling with stacks that have multiple sources, there can not be a single main binary. https://github.com/open-telemetry/opentelemetry-proto/pull/556
Profile.location: Semantic difference between "referenced by samples via location_indices" (OTel) vs "referenced by samples" (pprof)
Profile.string_table: Special meaning of strings like "pprof::". Could be solved in semantic convention
Sample.location_index: Deprecated in OTel conflicts with pprof; OTel also uses a slightly different comment on this element
Sample.location_index: OTel name conflicts with pprof name Sample.location_id
Sample.label: Deprecated in OTel conflicts with pprof
Sample.label: OTel introduced Sample.attributes. Semantic convention should define where and when to use one over the other or both.
Sample.link: Can this new OTel element be dropped and replaced when using Sample.label with a semantic convention for span_id and trace_id?
message Label: Deprecated in OTel conflicts with pprof
Mapping.id: Deprecated in OTel conflicts with pprof
Mapping.id: pprof expects an id other than 0. Could be solved in semantic convention.
Location.id: Deprecated in OTel conflicts with pprof
Location.id: pprof expects an id other than 0. Could be solved in semantic convention.
Location.mapping_index: OTel name conflicts with pprof name Location.mapping_id
Location.type_index: Introduced by OTel and less defined. Could be solved in semantic convention.
Location.type_index: Should use int64 over uint32 to be pprof compliant
Line.function_index: OTel name conflicts with pprof name Line.function_id
Function.id: Deprecated in OTel conflicts with pprof
Function.id: pprof expects an id other than 0. Cloud be solved in semantic convention.
To resolve the list of these conflicts, a path forward could look like this:
@aalexand I'm happy to help resolving these conflicts.
To update on the status of this:
We discussed this topic with @aalexand and @open-telemetry/profiling-maintainers and decided that we will not aim for strict compatibility with pprof. Instead we will aim for "convertibility", similarly to what we already do for other signals. To quote from log data model spec:
Existing log formats can be unambiguously mapped to this data model. Reverse mapping from this data model is also possible to the extent that the target log format has equivalent capabilities.
We will adopt the same approach for profiling data model / format.
I am closing this issue since there is nothing else needed at the moment. The profiling format will follow the usual maturity/stability requirements.
We had a discussion in the Profiling SIG and believe that it is necessary to document what changes are allowed and what changes are prohibited for profiles.
@open-telemetry/profiling-maintainers please assign this to a person who will own this work. I am going to close my draft https://github.com/open-telemetry/opentelemetry-proto/pull/559 for now and will be waiting for new PR that resolves this issue.