Introduces Profiling Data Model v2

petethepig commented 1 year ago

This is second version of the Profiling Data Model OTEP. After we've gotten feedback from the greater OTel community we went back to the drawing board and came up with a new version of the data model. The main difference between the two versions is that the new version is more similar to the original pprof format, which makes it easier to understand and implement. It also has better performance characteristics. We've also incorporated a lot of the feedback we've gotten on the first PR into this OTEP.

Some minor details about the data model are still being discussed and will be flushed out in the future OTEPs. We intend to finalize these details after doing experiments with early versions of working client + collector + backend implementations and getting feedback from the community. The goal of this OTEP is to provide a solid foundation for these experiments.

So far we've done a number of things to validate it:

we've written a new profiles proto described in this OTEP
we've documented decisions made along the way in a decision log
we've done benchmarking to refine the data representation (see Benchmarking section in a collector PR)
diff between original pprof and the new proto: link

We're seeking feedback and hoping to get this approved.

For (a lot) more details, see:

OTel Profiling SIG Meeting Notes

mtwo commented 12 months ago

@carlosalberto @jack-berg can we mark this as triaged with priority p0?

tigrannajaryan commented 12 months ago

The main difference between the two versions is that the new version is more similar to the original pprof format

@petethepig More similar, but not fully backwards compatible with pprof?

(I haven't had a chance to review the OTEP yet, just wanted to understand what to expect before I started revewing).

[UPDATE]: I have now reviewed the PR, which mostly answers this question. I still have a follow up comment/question on pprof compatiblity, see below.

aalexand commented 11 months ago

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

One thing is interning, another thing that pprof labels have is unit support. E.g. int64 bytes can be distinguished from int64 microseconds or count of something. This is used by pprof for using appropriate unit suffixes and scaling the label values appropriately.

petethepig commented 11 months ago

@tigrannajaryan

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

We did some benchmarking for that. Here are the results (it's also in this spreadsheet, "Attribute Represenations Summary" sheet): Screenshot 2023-11-29 at 6 16 09 PM

And here's the difference between the 3 implementations:


// Extended
message Sample {
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 9;
}

// ExtendedInterned
message Sample {
  repeated opentelemetry.proto.common.v1.KeyValueInterned attributes = 6;
}

// ExtendedLookup
message Profile {
  // lookup table
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 9;
}

message Sample {
  repeated uint64 attributes = 10;
}

To be clear, in the OTEP we're using ExtendedLookup represenation. It does not do the same kind of interning as it's done in original pprof (that would be ExtendedInterned).

ExtendedInterned uses a string_table for string values and ExtendedLookup uses an attributes lookup table. Extended uses a schema that's more common for other OTEL signals where the attributes are embedded inside other messages (Sample in our case).

aalexand commented 10 months ago

How much difference does AnyValue/KeyValue interning make? I would prefer not to have a different type just for profiles. If it is a significant difference we may want to consider adding interning support to existing AnyValue/KeyValue.

One thing is interning, another thing that pprof labels have is unit support. E.g. int64 bytes can be distinguished from int64 microseconds or count of something. This is used by pprof for using appropriate unit suffixes and scaling the label values appropriately.

We still need units for labels I think.

tigrannajaryan commented 10 months ago

If label/attribute units are critical I suggest that we start this discussion early.

I looked at other signal types and there is a few attribute conventions that would benefit from units (but not many):

Span attributes (all bytes): message.compressed_size message.uncompressed_size messaging.message.body.size messaging.message.envelope.size faas.max_memory

No many metric attributes that can have units, I was only able to find one (WAh): hw.battery.capacity

Similarly just one resource attribute with unit (bytes): host.cpu.cache.l2.size

There may be others that I missed.

You can file an issue in https://github.com/open-telemetry/opentelemetry-proto to begin the discussion.

petethepig commented 10 months ago

I made a few changes to this OTEP after our discussion today:

added a new attribute_units field to add support for units (Alexey & Tigran, curious what you think about this solution, it ended up being less ugly than I thought)
introduced new BuildIdKind enum to handle different types of build_ids (need feedback from Elastic folks) ( brought back Link structure for trace_id / span_id linking (cc Alexey & Tigran)
added a section describing “Open Questions” — things we intend to address in the experimentation phase

cc @aalexand @tigrannajaryan @thomasdullien

aalexand commented 10 months ago

I made a few changes to this OTEP after our discussion today:

added a new attribute_units field to add support for units (Alexey & Tigran, curious what you think about this solution, it ended up being less ugly than I thought)

introduced new BuildIdKind enum to handle different types of build_ids (need feedback from Elastic folks) ( brought back Link structure for trace_id / span_id linking (cc Alexey & Tigran)

added a section describing “Open Questions” — things we intend to address in the experimentation phase

cc @aalexand @tigrannajaryan @thomasdullien

Some assorted comments :-) :

RE Link message - is this name standard in OTel and we borrow / reuse it? I found it overly generic. I'd name this SpanLink or something like that.

Can we have single link per sample instead of a repeated field? I think dealing with the semantics of linking to multiple spans is going to be trickier (tooling-wise, for example) so I want to make sure it's not a YAGNI. Documenting the motivation would be great.

For attribute units, I was envisioning something more like a single global table that would map attribute name to a unit and thus require that for all samples the unit is the same. With the current approach conflicts are possible. Plus I think the proposed approach can be expensive - we add a slice for each sample which means one more memory allocation. As a parallel, we've been discussing why we reference locations via subslicing rather than via a separate CallStack / Callchain entity and the answer there was to reduce memory allocations, but here we add a slice per sample which is even worse I think than the CallStack table since there is generally should be less call stacks than samples but here it's clearly 1:1 with samples.

RE Tigran - pprof definitely needs units for labels, the CLI uses them today. One thing is that if we introduce them as a field to OTel label without figuring out the interning then the space usage for the case of high-cardinality label would get even worse since we won't just duplicate the label name but also the unit string...

Maybe it's worth leaving out the units change completely for now and discuss it in follow-up PRs or a dedicated issue. Maybe same about Link? Independent focused discussions are kind of nice.

I would suggest organizing fields logically, not by proto ID simply. E.g. I'd put build_id_kind after build_id field.

For build_id, a couple of comment change suggestions:

// Indicates the semantics of the build_id field.
enum BuildIdKind {
  // Linker-generated build ID, stored in the ELF binary notes.
  BUILD_ID_LINKER = 0;
  // Build ID based on the content hash of the binary. Currently no particular
  // hashing approach is standardized, so a given producer needs to define it
  // themselves and thus unlike BUILD_ID_LINKER this kind of hash is producer-specific.
  // We may choose to provide a standardized stable hash recommendation later.
  BUILD_ID_BINARY_HASH = 1;
}

"See BuildIdKind enum for more detaiks" has a typo.

Nit nit: We should make headings consistent in the OTEP text on whether we follow "Sentence case" or "Title Case" in them. There is a mix right now. I'd take a look at what OTel does as a whole.

Nit nit: Does OTel have any guidelines of comment width? I know Go recommends keeping comments in 80 columns even though it recommends not wrapping code. I'm ambivalent on this, just want to make sure we follow local conventions.

petethepig commented 10 months ago

@aalexand Addressed many of your comments, more details below:

RE Link message - is this name standard in OTel and we borrow / reuse it? I found it overly generic. I'd name this SpanLink or something like that.

Link comes from traces spec: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L238-L242

I agree SpanLink sounds a bit better, but feel strongly about this. @tigrannajaryan what do you think?

Can we have single link per sample instead of a repeated field? I think dealing with the semantics of linking to multiple spans is going to be trickier (tooling-wise, for example) so I want to make sure it's not a YAGNI. Documenting the motivation would be great.

Agreed, addressed in the latest version of the PR.

For attribute units, I was envisioning something more like a single global table that would map attribute name to a unit and thus require that for all samples the unit is the same.

Agreed, I like your way more. I addressed in the latest version of the PR, please take a look.

I would suggest organizing fields logically, not by proto ID simply. E.g. I'd put build_id_kind after build_id field.

Agreed, changed the order in other messages too.

For build_id, a couple of comment change suggestions:

Thanks for writing these up, addressed in the latest version of the PR.

"See BuildIdKind enum for more detaiks" has a typo.

fixed

Nit nit: We should make headings consistent in the OTEP text on whether we follow "Sentence case" or "Title Case" in them. There is a mix right now. I'd take a look at what OTel does as a whole.

fixed

Nit nit: Does OTel have any guidelines of comment width? I know Go recommends keeping comments in 80 columns even though it recommends not wrapping code. I'm ambivalent on this, just want to make sure we follow local conventions.

I could not find anything specific. cc @tigrannajaryan

aalexand commented 10 months ago

I'd probably prefer a single enum...

On Mon, Jan 22, 2024, 18:48 Dmitry Filimonov @.***> wrote:

@.**** commented on this pull request.

In text/profiles/0239-profiles-data-model.md https://github.com/open-telemetry/oteps/pull/239#discussion_r1462664968:

+} + +// A pointer from a profile Sample to a trace Span. +// Connects a profile sample to a trace span, identified by unique trace and span IDs. +message Link {

// A unique identifier of a trace that this linked span is part of. The ID is a

// 16-byte array.

bytes trace_id = 1;

// A unique identifier for the linked span. The ID is an 8-byte array.

bytes span_id = 2; +}

+// Specifies the method of aggregating metric values, either DELTA (change since last report) +// or CUMULATIVE (total since a fixed start time). +enum AggregationTemporality {

That's a good point. For reference, The way it's handled in metrics is by using two separate types of messages: gauges and sums https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto#L193-L217, where gauges don't have aggregation temporality, but sums do, so you end up with 3 choices:

gauge

sum delta

sum cumulative

I imagine this is not going to work in our case — let me think about this some more.

— Reply to this email directly, view it on GitHub https://github.com/open-telemetry/oteps/pull/239#discussion_r1462664968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALS3QD2DGMFTREJ4QZRU4TYP4QI7AVCNFSM6AAAAAA62KJTCKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQMZXG43TCMRQGM . You are receiving this because you were mentioned.Message ID: @.***>

tigrannajaryan commented 10 months ago

@open-telemetry/specs-approvers please review.

brancz commented 9 months ago

So ultimately we don’t care we’ll always implement support, and implement a much deeper integrated custom format between our server and agent. However, I frankly don’t think this format is thought through enough nor has the space had enough time to innovate so I think it will stifle innovation in the space as enterprises are going to want exactly this format, it’s unsaid but to me it seems this is the primary motivation for this: to have a standard. And in my opinion this is not good for the greater community and ecosystem long term. Besides that I think we’re ignoring concerns from long term industry experts (eg. Russ Cox). It feels like we’re just trying to push a standard through to push a standard through.

I realize this sounds harsh, but I don’t think it’s the right thing to do.

tigrannajaryan commented 9 months ago

I think we’re ignoring concerns from long term industry experts

@brancz what concerns are you talking about? Please elaborate or provide some links.

mtwo commented 9 months ago

Following up from today's meeting:

@petethepig mentioned that he's going to take one last look at the documentation / rules for attribute encoding.
We discussed two (minor) areas that are still under consideration (timestamps and aggregation temporality), and whether these should be included or not. The decision was to mark them as experimental so that we leave the door open to changes, them becoming 1.0, or their removal in the future. These are not a 1.0 requirement and implementations will not depend upon these.

We'll still need at least two more TC members to review and sign off before merging this. I'll let them know that this is ready for broad review.

tigrannajaryan commented 9 months ago

@open-telemetry/specs-approvers we need more reviews from official approvers on this OTEP.

I see significant consensus and approval from the Profiling SIG members but we can't merge the OTEP until we have 4 green marks.

open-telemetry / oteps

Introduces Profiling Data Model v2 #239

@.**** commented on this pull request.