open-telemetry / opentelemetry-configuration

JSON Schema definitions for OpenTelemetry file configuration
Apache License 2.0
41 stars 17 forks source link

Simplify endpoint #103

Open marcalff opened 4 months ago

marcalff commented 4 months ago

The existing environment variables used in the spec to describe an endpoint:

are the result of incremental design, due to history.

The current design for config.yaml derives from this existing specification, and currently is:

otlp:
  # http/protobuf, http/json, or grpc
  protocol:

  # http endpoint, https endpoint, or no scheme
  endpoint:

  # true or false, only used for grpc protocol with no scheme endpoint
  insecure: 

Exposing the same fields to match the existing environment variables has some merits, but also carries over existing issues.

Existing configuration

  1. Protocol

The protocol to use should be in the endpoint URL. Instead, when using grpc, the "protocol" is set to "grpc", while the endpoint may contain "http" or "https". This makes no sense.

  1. Message format

The message format, "protobuf" or "json", is mixed with the protocol. I don't think it belongs there.

  1. Security

The "insecure" field is a last resort configuration parameter for grpc when not used with http/https. This is too convoluted.

  1. Lack or orthogonality

With the existing configuration, if someone changes the protocol between http and grpc, then the following fields must be adjusted as well:

Proposed configuration

otlp:
  # Valid values are "protobuf" or "json".
  content_type: protobuf

  # Full URL with:
  # - a scheme ("http", "https", "grpc", "grpcs")
  # - a host name
  # - a port number
  # - an optional path, for http
  endpoint:

Examples of configuration:

otlp:
  content_type: protobuf
  endpoint: "http://localhost:4318/v1/traces"
otlp:
  content_type: json
  endpoint: "https://localhost:4318/v1/traces"
otlp:
  content_type: protobuf
  endpoint: "grpc://localhost:4317"
otlp:
  content_type: protobuf
  endpoint: "grpcs://localhost:4317"

Note how this simplifies things, as the endpoint URL specifies:

all this in the same field.

It then becomes much easier to change from:

  endpoint: "grpc://localhost:4317"

to

  endpoint: "https://localhost:4318/v1/traces"

when editing a config file, or when providing substitution variables.

General considerations

Note that I am NOT asking to change existing environment variables in the specs, as this will have some global impact on all SIG and all existing deployments.

This ship has sailed.

What I am asking to consider, is to have a different design for config.yaml only, which I think makes more sense and will be more practical to work with in the long term.

Given that there are no existing deployments using config.yaml, there is no impact if done at this stage.

The migration to config.yaml will not be any more complex, mapping to the new content_type and endpoint fields is trivial.

I think we should take this opportunity to simplify.

marcalff commented 4 months ago

Filed separately from comments on #76, so this can be discussed without affecting the merge of that PR.

jack-berg commented 4 months ago

Couple of thoughts:

marcalff commented 4 months ago

Couple of thoughts:

These two:

will no longer be used.

  • This adds interpretation requirements (something discussed in this convo), since code has to detect the grpc / grpcs protocols and strip the protocol and configure the corresponding gRPC exporter according to the host and TLS from the endpoint.

Nothing changed from today. When the protocol is grpc, the code has to parse a hostname and port number from the endpoint http://host:port, while taking the http or https scheme, if present, as an indication for INSECURE.

  • Not sure, but are grpc / grpcs official values for scheme?

I have seen some references to both in the wild, for example:

https://github.com/postmanlabs/postman-app-support/issues/11489

However, grpc is not even in the iana registry:

https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

The point is, whatever bit dialect an exporter sends on a socket on the wire, there should be a scheme to name that dialect, and the scheme with / without TLS should have a different name, since the bits are really different.

tsloughter commented 4 months ago

I don't think grpc:// should be used.

jack-berg commented 3 months ago

Discussed in the 8/19/24 SIG.

I think we should take this opportunity to simplify.

This takes further away from compatibility with the environment variable scheme. We could solve this by having two sets of properties: one for interop with existing environment variable scheme, a second for improved use experience. But this adds confusion and the question is whether the current scheme had bad enough ergonomics to justify the additional cognitive load.

Didn't reach an answer, but we need to address this with a yes or no before stability.

jack-berg commented 4 days ago

I was thinking about this more and came up with a slightly different direction we could go: What if we followed the the collector, and introduced separate exporters for [grpc](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlphttpexporter] and HTTP. We could then eliminate the protocol property, and instead have an encoding property for the http variant.

The result would be something like:

tracer_provider
  processors:
    - batch:
         exporter:
           otlphttp:
             endpoint: http://localhost:4318/v1/traces
             encoding: proto # or json
    - batch:
         exporter:
           otlpgrpc:
             endpoint: http://localhost:4317

One issue I see is that the collector uses otlp to refer to the gRPC variant, and otlphttp to refer to the HTTP variant. The fact that the HTTP variant has a *http suffix while the gRPC variant does not implies a high level of priority to the gRPC variant. If I had to guess, I'd say that this naming decision pre-dated otel's shift to prefer http/protobuf as the default over grpc.

So if we went in this direction, we'd need to decide whether to follow the now suboptimal naming of the collector, or introduce more explicit names like otlphttp and otlpgrpc as I've shown above.

I think this solves the problems you outline while avoiding inventing anything new (i.e. leaning on prior art in the collector).