signalfx / splunk-otel-js

Splunk Distribution of OpenTelemetry JavaScript
https://docs.splunk.com/Observability/gdi/get-data-in/application/nodejs/get-started.html
Apache License 2.0
23 stars 13 forks source link

Audit Compliance GDI specification v1.6.0: Behaviors #977

Open pellared opened 8 hours ago

pellared commented 8 hours ago

Review compliance against https://github.com/signalfx/gdi-specification/blob/v1.6.0/specification/behaviors.md

pellared commented 7 hours ago

An instrumentation library that has profiling capabilities MUST be able to sample call stacks at a fixed interval.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/index.ts#L248-L250

This default interval MUST default to 10 seconds.

❌ Node.js samples at 1 second intervals due to only single thread being used. The 10 second interval was meant for cases where threads might exist to limit the amount of data.

When a language runtime supports threading, stacks MUST be sampled across all process threads.

✅ Not applicable.

The samples for all threads SHOULD be taken instantaneously and, in the event that this is not feasible, MUST be taken as close together as possible.

✅ The sampling code path is minimal.

If the samples are taken consecutively, then the profiler MUST use a consistent ordering strategy (such as thread name or ID) when sampling all threads.

✅ Not applicable.

Various runtimes MAY contain internal and other threads that are undesirable to include for profiling. This could include threads that are internal to runtime behavior or instrumentation library internal workings. The choice of which threads are undesirable is implementation specific and not defined.

✅ Not applicable.

The profiler SHOULD NOT collect call stacks from undesirable threads. If this is not possible, the profiler SHOULD filter these out afterward and omit these call stacks from ingest.

✅ Not applicable.

When a call stack is sampled during the execution of a span scope, the profiler MUST be able to associate the call stack to the span.

https://github.com/signalfx/splunk-otel-js/blob/main/src/native_ext/profiling.cpp#L575-L581

This association SHOULD happen as close to the sampling point as feasible, but MAY occur later in a processing pipeline.

✅ Association is timestamp based, we track span activations.

After the association has been made, the TraceId and SpanId fields of the LogRecord message MUST be populated (see here).

❌ This doesn't really make sense as a log record contains the whole serialized protobuf, which contains multiple samples, each has their association in the labels.

Call stacks MUST be ingested as OpenTelemetry Logs.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/OtlpHttpProfilingExporter.ts#L111

The logs containing profiling data MUST be sent via OTLP.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/OtlpHttpProfilingExporter.ts#L76

Instrumentation libraries SHOULD reuse persistent OTLP connections from other signals (traces, metrics).

✅ Connections are handled by the JS SDK, not applicable to our distro.

seemk commented 7 hours ago

An instrumentation library that has profiling capabilities MUST be able to sample call stacks at a fixed interval.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/index.ts#L248-L250

This default interval MUST default to 10 seconds.

❌ Node.js samples at 1 second intervals due to only single thread being used. The 10 second interval was meant for cases where threads might exist to limit the amount of data.

When a language runtime supports threading, stacks MUST be sampled across all process threads.

🟨 Not applicable.

The samples for all threads SHOULD be taken instantaneously and, in the event that this is not feasible, MUST be taken as close together as possible.

✅ The sampling code path is minimal.

If the samples are taken consecutively, then the profiler MUST use a consistent ordering strategy (such as thread name or ID) when sampling all threads.

🟨 Not applicable.

Various runtimes MAY contain internal and other threads that are undesirable to include for profiling. This could include threads that are internal to runtime behavior or instrumentation library internal workings. The choice of which threads are undesirable is implementation specific and not defined.

🟨 Not applicable.

The profiler SHOULD NOT collect call stacks from undesirable threads. If this is not possible, the profiler SHOULD filter these out afterward and omit these call stacks from ingest.

🟨 Not applicable.

When a call stack is sampled during the execution of a span scope, the profiler MUST be able to associate the call stack to the span.

https://github.com/signalfx/splunk-otel-js/blob/main/src/native_ext/profiling.cpp#L575-L581

This association SHOULD happen as close to the sampling point as feasible, but MAY occur later in a processing pipeline.

✅ Association is timestamp based, we track span activations.

After the association has been made, the TraceId and SpanId fields of the LogRecord message MUST be populated (see here).

❌ This doesn't really make sense as a log record contains the whole serialized protobuf, which contains multiple samples, each has their association in the labels.

Call stacks MUST be ingested as OpenTelemetry Logs.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/OtlpHttpProfilingExporter.ts#L111

The logs containing profiling data MUST be sent via OTLP.

https://github.com/signalfx/splunk-otel-js/blob/main/src/profiling/OtlpHttpProfilingExporter.ts#L76

Instrumentation libraries SHOULD reuse persistent OTLP connections from other signals (traces, metrics).

🟨 Connections are handled by the JS SDK, not applicable to our distro.

pellared commented 6 hours ago

❌ Node.js samples at 1 second intervals due to only single thread being used. The 10 second interval was meant for cases where threads might exist to limit the amount of data.

GDI spec issue created:

❌ This doesn't really make sense as a log record contains the whole serialized protobuf, which contains multiple samples, each has their association in the labels.

GDI spec issue created: