open-telemetry / opentelemetry-proto

OpenTelemetry protocol (OTLP) specification and Protobuf definitions
https://opentelemetry.io/docs/specs/otlp/
Apache License 2.0
545 stars 242 forks source link

profiles: Add normalized address field #553

Closed brancz closed 2 weeks ago

brancz commented 1 month ago

As discussed in the last otel profiling call. The comment should have all the context for why this is needed, if not then let me know and I'll add more detail.

@felixge @florianl @petethepig @tigrannajaryan @jack-berg @simonswine

(feel free to tag anyone else)

athre0z commented 2 weeks ago

I discussed this internally with the profiling folks at Elastic and we agree that a dedicated field isn't really necessary.

I suspect that what lead @brancz to suggest this field is that in the donated OTel profiling agent we incorrectly populated the Mapping record fields memory_start, memory_limit, and file_offset. We send out all sample addresses rebased into the corresponding ELF's VA space (just like @aalexand describes is the case for pprof), but left the previously mentioned fields either uninitialized (0) or at bogus values (in case of file_offset).

Instead of defining a complex normalization schema here, I started working on a PR that correctly initializes the fields, just like pprof tooling does. With these fields initialized properly, any consumer should have enough information to convert the sample addresses into whatever address space they wish using the schema described by @aalexand here.

aalexand commented 2 weeks ago

I discussed this internally with the profiling folks at Elastic and we agree that a dedicated field isn't really necessary.

I suspect that what lead @brancz to suggesting this field is that in the donated OTel profiling agent we incorrectly populated the Mapping record fields memory_start, memory_limit, and file_offset. We send out all sample addresses rebased into the corresponding ELF's VA space (just like @aalexand describes is the case for pprof), but left the previously mentioned fields either uninitialized (0) or at bogus values (file_offset).

Instead of defining a complex normalization schema here, I started working on a PR that correctly initializes the fields, just like pprof tooling does. With these fields initialized properly, any consumer should have enough information to convert the sample addresses into whatever address space they wish using the schema described by @aalexand here.

Great, good to hear! We can probably improve the docs to describe better recommendations to collector authors / owners. There are only two hard problems about building a profiler after all: stack unwinding and dealing with ELF offsets.

brancz commented 2 weeks ago

I'm perfectly happy with fully populating locations since that's what we eventually also opted to doing in Parca Agent (specifically to be fully pprof compatible, we originally also stabilized addresses). However, I think having the stabilized address optionally could be interesting, as the whole reason was to be able to combine/use it with stack IDs that would be stable across hosts even in the presence of ASLR so not the whole stack would need to be sent across the wire.

aalexand commented 2 weeks ago

I'm perfectly happy with fully populating locations since that's what we eventually also opted to doing in Parca Agent (specifically to be fully pprof compatible, we originally also stabilized addresses). However, I think having the stabilized address optionally could be interesting, as the whole reason was to be able to combine/use it with stack IDs that would be stable across hosts even in the presence of ASLR so not the whole stack would need to be sent across the wire.

Even if a collector decides to put in elf-header-normalized addresses into profiles, I don't think the proposed field is needed. The profile would just record adjusted sample and mapping addresses. Consumers shouldn't care.

brancz commented 2 weeks ago

Perfectly happy with that but how to consumers know not to double apply then?

aalexand commented 2 weeks ago

Perfectly happy with that but how to consumers know not to double apply then?

Attempting to apply it again will detect that the mapping load address already matches the load address in the ELF segment header, so the adjustment will be zero and there is nothing to apply.

athre0z commented 2 weeks ago

I'm perfectly happy with fully populating locations since that's what we eventually also opted to doing in Parca Agent (specifically to be fully pprof compatible, we originally also stabilized addresses). However, I think having the stabilized address optionally could be interesting, as the whole reason was to be able to combine/use it with stack IDs that would be stable across hosts even in the presence of ASLR so not the whole stack would need to be sent across the wire.

Hmm, I think there might be some confusion with what ELF address space means here. An address in ELF address space is stabilized and generalizes across hosts. It's the address that is used in the executable itself before relocations are applied, and the addresses that all common executable analysis tooling (binutils, llvm-dwarfdump, Ghidra, IDA, ...) will display. It's stable across hosts.

What you seem to be referring to here would be the mapped / process address which will vary depending on ASLR bias. I'm not proposing to send samples in this address space in the OTel profiler and I also wouldn't recommend doing that to any other implementation. You'd essentially be leaking the address layout in your production cluster to whomever has access to your profiling data; it could be considered an exploitable information leak (circumvents ASLR).

However, technically with the schema where we send the memory_start, memory_limit, and file_offset fields along with the address, in terms of the ability to normalize across hosts it really doesn't matter what address space an implementation chooses to send samples with, because you have all the required information to transform it into another one on the collector side.

Perhaps we should simply change documentation and strongly encourage implementations to transform addresses into some form of stable address space?

aalexand commented 2 weeks ago

Perhaps we should simply change documentation and strongly encourage implementations to transform addresses into some form of stable address space?

This SGTM. The per-file/mapping ELF address space is probably the most practical choice as you said.

brancz commented 2 weeks ago

Got it, that works. I’d say that should be a separate PR then.