open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.65k stars 873 forks source link

Remote SDK Management #2207

Open tigrannajaryan opened 2 years ago

tigrannajaryan commented 2 years ago

The Agent Management Workgroup is currently working on a protocol specification and prototype implementation of remote agent management capabilities. This will be likely used with OpenTelemetry Collector and possibly with other data collection agents. One element of capabilities is the ability for the Agents to receive their configuration from a remote server.

A few people expressed interest in having a similar capability for Otel SDKs, i.e. for the SDK to receives its configuration from a remote server.

We could potentially implement this in a way that uses the same communication protocol between Otel Collector and Management Server and between Otel SDKs and Management Server. The benefit of such approach would be that there is a single view of all SDK-instrumented applications and all Otel Collectors, all communicating and receiving their (different) configurations from one Management Server.

If we decide that we want such capability for SDKs we likely need to come up with a config file format for SDKs.

This issue is a request for comments. I would like to understand if there is sufficient interest in this so that the Agent Management Workgroup can take this into account in our designs.

Please comment or upvote the issue.

Relevant issue in OpAMP repo: https://github.com/open-telemetry/opamp-spec/issues/16

Aneurysm9 commented 2 years ago

I think having a standard configuration for SDKs would be wonderful. As we consider how to migrate the collector to use the Go SDK for self metrics we are confronted with the need to configure that SDK and the desire to expose options for that configuration to the end-user alongside the rest of the collector configuration. I'd really rather not define a configuration mechanism that is only useful to the collector.

Once we have a configuration structure defined it seems like a logical step to enable retrieving (and perhaps updating) that configuration from a remote source. If we can leverage the work already being done to make that work for agents then I'm fully supportive of doing so.

meastp commented 2 years ago

Would changing the trace sampler for specific components(/services) (e.g. from probability sampler to always on temporarily) or changing the log level (for a period) for all components be a use case? (If the SDKs/components could be set up to periodically retrieve the configuration from a remote source)

tigrannajaryan commented 2 years ago

Would changing the trace sampler for specific components(/services) (e.g. from probability sampler to always on temporarily) or changing the log level (for a period) for all components be a use case? (If the SDKs/components could be set up to periodically retrieve the configuration from a remote source)

Yes, provided that these values are settings in a configuration file that can be retrieved from a remote source.

Generally the management solution / protocol we are working on doesn't care about what individual settings are manageable. From protocol's perspective the configuration is an opaque stream of bytes and the protocol's concern is only to deliver that configuration and then it is up to the Agent (or SDK in this case) to interpret and apply the configuration.

Whether the particular value (sampling rate of log level) is updatable at runtime is up to the SDK implementation. (I would argue that they should be updatable).

tigrannajaryan commented 2 years ago

@open-telemetry/specs-approvers @open-telemetry/technical-committee I would love to know what you think about this.

tigrannajaryan commented 2 years ago

I added a draft proposal to support plain HTTP transport in OpAMP: https://github.com/open-telemetry/opamp-spec/pull/70

The primary use case I have is for Otel SDK remote configuration, so it would be great to know what folks here think about it.

gouthamve commented 1 year ago

We are looking for something similar. We would like the SDKs to pick up the Metrics Reader Interval from the Collector or a central point: https://opentelemetry.io/docs/reference/specification/sdk-environment-variables/#periodic-exporting-metricreader

I think this would be perfect, has there been any movement here?

tigrannajaryan commented 1 year ago

I think this would be perfect, has there been any movement here?

Yes, 2 things happened since my last comment:

  1. Work is in progress on defining a config file format for SDK, see https://github.com/open-telemetry/oteps/blob/main/text/0225-configuration.md and https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/file-configuration.md
  2. OpAMP now has plain HTTP transport support which should make it simpler to implement it in the SDKs.

Once the config file format is finalized the decision needs to be made if we want a remote management capability and if we do we can work on defining how exactly it can use OpAMP to fetch the config file remotely.