open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.41k stars 1.46k forks source link

How to declare optional parts of the config? #10266

Open yurishkuro opened 5 months ago

yurishkuro commented 5 months ago

Slightly related to the discussion in open-telemetry/opentelemetry-collector-contrib#9478 (cc @mx-psi), but more about the conflict between a desire to have optional configs which nonetheless have defaults. Most obvious example is OTLP Receiver, which is often configured like this:

receivers:
  otlp:
    protocols:
      grpc:
      http:

Here the grpc/http subsections do not have any fields, in particular port numbers, because they are provided by createDefaultConfig(). However, it makes it look like one could simple remove e.g. http to disable creating an HTTP server, which is not actually true - because the default config still populates http even if it's absent in the config.

Now if there were no defaults, this would work fine - both http and grpc fields are declared as pointers and thus could be nil, indicating an absence. But this is completely circumvented by the createDefaultConfig() mechanism.

Motivation

  1. In case of OTLP exporter I might not want to support HTTP endpoint. Right now there's no way to disable it afaiu.
  2. In Jaeger components we often have a need to define alternate sub-sections of the config, e.g. remote sampling extension can be backed by a file or by an adaptive calculator. Each of the options requires its own configuration with defaults (e.g. file refresh interval).

Solution?

I don't have a clear one, but it would be nice to have more granular createDefault functions that would only be called once the config is read and a given section is seen as non-empty in the config. As I understand the first pass of the config parsing is by reading YAML into nested maps, which are then decoded into the objects.

mx-psi commented 5 months ago

However, it makes it look like one could simple remove e.g. http to disable creating an HTTP server, which is not actually true - because the default config still populates http even if it's absent in the config.

The way we work around this on the OTLP receiver is https://github.com/open-telemetry/opentelemetry-collector/blob/5ba6000d4a0efd95229f4fc7e22c51717306ac0c/receiver/otlpreceiver/config.go#L68-L73

I don't love that approach, but it has worked so far

yurishkuro commented 5 months ago

Thanks for the pointer. Indeed, I was wrong in the issue description - OTLP receiver does disable http if the key is removed from the config. We will try to use the same approach in the Jaeger components.

If there are no plans or possibilities to have a better solution, feel free to close this issue.

evan-bradley commented 5 months ago

I would prefer to solve this with a declarative solution instead of requiring manual Unmarshal implementations to do this. I see two possible approaches here:

  1. Implement a confmap.Optional type that implements its own Unmarshal method that checks the presence of the field. This would work similar to optional types in other languages and would provide methods for checking both field presence and getting the value of the underlying field.
  2. Put a struct field tag on optional fields and have the unmarshaler check for these when unmarshaling the struct.

I think both of these have trade-offs and I'm not sure which one is the better approach.

yurishkuro commented 5 months ago

@evan-bradley re (2), how would that be different than just having a pointer? I think we always need a structural distinction between required and optional sub-configs, which is achieved via Optional type in (1).

(1) sounds like a good path. I supposed the components would still need to define the complete default configuration, but the confmap.Optional[SubConfig] points would erase those if not set in the file.

evan-bradley commented 5 months ago

re (2), how would that be different than just having a pointer?

The problem is that we can also have pointers to config structs that are intended to be references and not just optional values. Combined with default values, this makes it ambiguous to the unmarshaler whether the value is required but has a default, or is truly optional and the pointer should be set to nil when the key isn't present. We would need to find a way to describe that extra bit of information.

Overall I like (1) more since the handling is more explicit, but it may come with trade-offs that make (2) a better overall solution.

mx-psi commented 5 months ago

@yurishkuro If you are willing to own this, I think we would be happy to review a PR that adds a wrapper like this to opentelemetry-collector (the concrete package where this would live is to be decided I guess, but we can discuss this on the PR).

evan-bradley commented 5 months ago

Not sure if it helps, but OTTL has an Optional type we use, just as an example of how something like this may work. You can see it here.

yurishkuro commented 5 months ago

I fiddled with this a bit in https://github.com/open-telemetry/opentelemetry-collector/pull/10260

BTW, don't know why I opened the issue in contrib, we should move it to collector.

The issue I bumped into is that I could not find a good place to execute an override to erase default values if the input config omits the corresponding config section. The main confmap code uses decoder hooks, which fire at two relevant occasions

  1. first at the overall struct, e.g. otlpreceiver.Protocols which may have a field HTTP: Optional[HTTPConfig]
  2. they they fire for each field. This is where the custom Optional.Unmarshal is called, which is needed to delegate parsing to the underlying struct

The issue with (2) is that it only fires when there is a corresponding entry in the input config, so cannot be used to unset the default values already in the struct.

(1) is equivalent to the custom unmarshaler https://github.com/open-telemetry/opentelemetry-collector/issues/10266. This could be the hook that we need, but it needs to:

  1. loop through all fields of a struct via reflection (including flattened/embedded fields)
  2. find those that are Optional
  3. find the corresponding key in the map (which can come from field tags)
  4. if the key is not set then erase the Optional

All of that is of course doable, but manually / from scratch (especially step 3), as all the same functionality done by mapstructure is private.

I also looked into using Metadata that can be populated by mapstructure.Decoder. It provides the name of the fields that were "unset" in the config. However, those keys are still the YAML names, so it's not easy to map them back to the corresponding field in the struct.

mx-psi commented 5 months ago

The issue with (2) is that it only fires when there is a corresponding entry in the input config, so cannot be used to unset the default values already in the struct.

One option is to put it on the configuration consumer to check if the value is none. I think this is what we currently do on the fields where we use 'pointer to T'.

Another option is to make a type that stores the default value (so, the same as your Optional but with a default value if unset). I don't have a good name for that (WithDefault[T]? WithFallback[T]?), but it would be simple to implement.

yurishkuro commented 3 months ago

@mx-psi WithDefault[T]() was a good idea, but it didn't work either. When an item is defined in YAML but without any value, i.e. item: with no content, then mapstructure treats the value as nil, and does not perform any processing. As a result, the custom Unmarshal() is still not being called.

See test https://github.com/open-telemetry/opentelemetry-collector/pull/10260/files#diff-24c5961a7871021bd9801008aa5f1fad72efd43f8783a00c7d09dc52c4c64e0dR81