Improve otel collector configuration w/ JSON schema

LBF38 commented 6 months ago

Is your feature request related to a problem? Please describe.

For better defining the otel-collector-config.yml, a JSON schema would be great to validate the configuration before using it in the collector itself.

Describe the solution you'd like

A JSON Schema for validating the configuration when writing it, similarly to Docker compose files or Kubernetes manifest files, ...

Describe alternatives you've considered

Reading the docs multiple times, and checking config by running it on local cluster w/ Docker compose or Kubernetes.

TylerHelmuth commented 6 months ago

@TylerHelmuth If I understand json schema correctly you could write a schema file and use it to validate the config externally (not inside the collector) correct?

LBF38 commented 6 months ago

Yes, the configuration file in .yml will be statically checked against the JSON Schema, therefore providing an easier way to configure all the possible aspects of the otel collector.

You can try this out with a compose.yml file for defining Docker compose stacks, for example. And you will see that you get automatic intellisense on defining the important parts of the configuration.

And, in your schema, you can define if a property is required or not and his types.

TylerHelmuth commented 6 months ago

I recommend you write and maintain your own file specific to your collector build.

Since the collector is composable from components from anywhere, and those components can use any configuration they want, it isn't possible for us to provide a single, wholistic, schema file for any collector build.

We could maybe provide one with each distro we provide, but even that would be hard to maintain.

yurishkuro commented 6 months ago

@TylerHelmuth I disagree with your DIY position. The advantage of OTEL collector for custom distros is in providing a reusable baseline. From my perspective the discoverability of configuration of different components is very bad today, 2 out of 5. A DIY approach cannot solve for the lack of schema / documentation of existing components, it needs to be solved centrally in the main collector repo. JSON schema is one option, I personally would prefer protobuf schema (either would need a prototype to iron out kinks), as I mentioned in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003#issuecomment-1778587538

TylerHelmuth commented 6 months ago

@yurishkuro I agree that we could improve the consistency and quality of component documentation - being able to auto-generate standard docs from component config would be awesome.

What I'm arguing is that the Collector Core repo cannot supply a single schema file that could be used to statically validate any collector build.

Each distribution at https://github.com/open-telemetry/opentelemetry-collector-releases could maybe supply a schema that knows about each component in its manifest. Maybe the collector could add an extra command to generate a schema file based on the supplied config.

yurishkuro commented 6 months ago

What I'm arguing is that the Collector Core repo cannot supply a single schema file that could be used to statically validate any collector build.

agreed, but if there was a standard mechanism defined & used in Core it would be much more likely that components and contrib plugins would start using it to declare their configuration, improving the overall ecosystem.

cforce commented 6 months ago

With schema support issue https://github.com/open-telemetry/opentelemetry-collector/issues/9707 would be based upon

cmgriffing commented 6 months ago

I think this would improve the experience quite a bit.

As further info, if the JSON Schema were uploaded to SchemaStore.org, various tools and editors would get automatic support in their various plugins. https://www.schemastore.org/json/#editors

As an example, this VSCode YAML extension would be able to consume it: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml

While I understand the concerns for extensibility, a JSON schema can allow arbitrary fields in addition to the official fields. So there is nothing lost if a different tool wants to add custom config.

LBF38 commented 6 months ago

Hi everyone,

After exploring more related issues and work on this topic, I will try to summarize my findings in this comment.

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003
- Looks like this is a first work on automating the generation of YAML schema for the collector config file.
- Could be a solution to this issue if merged ?
- A related proposal defines the steps and configuration schema to generate here: https://docs.google.com/document/d/15SMsJAdE1BHAmwf8cLXBvsHDrezXOHniaRZafjO-5e4/edit
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24189
- This is also a related issue that has been created based on/related to the latter PR.
- would this be a suitable solution for adding jsonschema/YAML to the Collector config file ?
https://github.com/open-telemetry/opentelemetry-configuration
- It seems like there is a repo for opentelemetry configuration, but, as I'm unfamiliar with all those components, I don't know if it is related to the collector config or other components.
- I include it for reference.
https://github.com/open-telemetry/community/issues/1610
- this issue also seems to be on the same topic.
- I include it for reference too.
https://github.com/splunk/collector-config-tools/tree/main/cfg-metadata
- it seems like splunk has already made some work on this and generate some metadata files for community exporters, extensions, processors and receivers.
- that's one of their motivation to upstream this work for making it available for the whole community.

Moreover, here are some Go libs for generating jsonschema files based on Go types:

Thus, what are the next steps to implement this ? I'm willing to contribute if necessary.

LBF38 commented 5 months ago

Hi, A temporary alternative would be to use https://github.com/dash0hq/otelbin

This tool allows to validate the otel collector config against different schemas. And it also provides an easy way to visualise the collector flow.

See the live tool here also : https://www.otelbin.io/

jpkrohling commented 4 months ago

this could be useful to the operator as well, perhaps for the auto-upgrade mechanism

cc @yuriolisa

yurishkuro commented 4 months ago

@jpkrohling @TylerHelmuth I am noticing that OTEL is often not participating in LFX mentorship programs, wouldn't this be a good project for an intern?

NB: the deadline for summer term is today 5pm PT.

ptodev commented 4 months ago

Hello, I'd also be happy to work on this in the next few weeks/months and I'd be happy to collaborate with anyone else who is interested. I could make it one of my priorities for the next 3-4 months, as long as we have some certainty on what an acceptable solution would look like.

I'm especially interested in autogenerating documentation. The Collector docs are sometimes missing docs for certain config arguments, and sometimes the arguments are not documented in a consistent way.

This automation would also make it easier for maintainers of Collector distributions to keep their code and docs accurate and up to date. For example, when updating the OTel dependencies of Grafana Alloy we have a few manual steps:

Sync the Collector's config arguments with Alloy's.
Keep Alloy's docs up to date with Collector's.

I'm not sure what would be good next steps though? Should we discuss this in more detail on the CNCF slack? Or should we come up with a proposal and update this issue? Would @LBF38 be interested in collaborating over the next few weeks/months?

yurishkuro commented 4 months ago

@ptodev a proposal would be great. There is a fair amount of design exploration in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003, it's worth learning from that and making sure the proposal covers the requirements raised.

LBF38 commented 4 months ago

Hi @ptodev , I'm actively working around OpenTelemetry during the next few months so I'll be happy to help where I can, depending on the scope and depth of what's needed.

Let me know where I can best contribute. I would love to !

jpkrohling commented 4 months ago

@yurishkuro, absolutely! We had only one recent participation so far (it's still happening!), but we are often lacking mentors. Would you like to be the mentor for this feature? Although we are late for this round, I can see if we can get in there if you can be a mentor.

yurishkuro commented 4 months ago

@jpkrohling I am mentoring 3 projects in Jaeger in each of LFX terms, so I don't have capacity to take more. Since this specific issue will be important to Jaeger-v2 I can consider it for the Fall term, if not solved by then.

TBH, I find it surprising that the project the size of OTEL is lacking mentorship engagements. I rather suspect this is due to lack of focus on this area.

ptodev commented 4 months ago

@yurishkuro I'm not sure how we could generate sufficient docs using the protobuf approach. For example, where would the description string of each config attribute go inside the protobuf?

ptodev commented 4 months ago

Maybe the descriptions of the arguments could go into comments inside the proto file, and then they can be parsed using something like protoc-gen-doc? It seems like protoc-gen-doc can also output json, which could help folks who are bundling their own OTel distributions and want to have more control over how the documentation is displayed.

ptodev commented 4 months ago

I personally like the protobuf approach:

It's easier to read than json.
I suppose it'll work well with config arguments which are shared across components (e.g. HTTP server settings). Those arguments can be defined in a single proto file and imported in many places.

However, I don't know how we would indicate what the default value of a config argument is? Where would that go inside the proto file, and how would it be translated to Go code?

yurishkuro commented 4 months ago

@ptodev I have some pointers in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003#issuecomment-1778587538

TLDR: protobuf allows adding arbitrary annotations to IDL declarations (e.g. Collector is already using gogoprotobuf annotations to influence code generation). I don't know if having default values for fields is actually a requirement, but as long as there is a generic mechanism to add proto annotations that will inject struct tags (e.g. https://github.com/favadi/protoc-go-inject-tag) the solution can be built using those tags (lots of validation solutions out there already, maybe they do support default values too).

ptodev commented 4 months ago

I don't know if having default values for fields is actually a requirement

I believe it is, because the docs need to include it.

as long as there is a generic mechanism to add proto annotations that will inject struct tags the solution can be built using those tags

The problem is that mapstructure doesn't have a way to set a default value through a tag.

lots of validation solutions out there already, maybe they do support default values too

I hope so! I'll have a look to see if I can find any.

ptodev commented 3 months ago

@yurishkuro Unfortunately, I found it difficult to find a clean solution using protobuf. Using Pkl seems more promising to me, because it contains a lot of the features we need off the shelf. I opened #10376 to show how using Pkl in the batch processor could look like - please feel free to take a look :)

yurishkuro commented 3 months ago

Unfortunately, I found it difficult to find a clean solution using protobuf.

can you summarize your findings / difficulties?

ptodev commented 3 months ago

The schema could be useful for generating code and docs both in the main Collector repo, and in its distributions. For example:

Creating an alternative config.go in an OTel distribution where the mapstructure tags are not sufficient.
Generating markdown which list all config arguments, whether they are optional, what their default values are, and whether there are any constraints on them.

Protobuf v3 has a few notable limitations:

No optional (nullable) values.
No default values.
No "constraints" on arguments.

There are various solutions which exist to mitigate those issue individually to some extent. However, because they are all independent of each other, I don't think they would work very well together. For example, one could use protovalidate for validation, protoc-gen-doc for generating docs, and gnostic for generating schema, but since all of those are independent solutions, the validation rules won't be present in the docs and in the schema. In addition, I don't know how to work around the lack of optional arguments and default values.

On the other hand, Pkl seems pretty much designed for this use case and it is a single tool with almost everything we need right now. It was open-sourced by Apple a few months ago and is picking up popularity. Using the right tool for the job would look cleaner, be easier to maintain, and will cost less in the long term since hopefully the upstream project will maintain all those features as one cohesive whole, rather than having to rely on a set of independent products like in the Protobuf ecosystem.

theletterf commented 3 months ago

Adding myself to the party here, after opening https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24189 a year ago. +CC @atoulme.

The stake for Splunk, and for OpenTelemetry.io docs, is having reliable information in YAML format as to what metrics and configs are available for each component.

ptodev commented 3 months ago

Hi, @theletterf! In order for you to generate docs, I suppose you need information such as:

Whether a given config argument is optional or mandatory.
What the default value is.
Whether there are constraints on the argument. E.g. a log_level string argument which can only have a value of either "info" or "debug". Or a send_batch_max_size_bytes argument which has to be greater than "0".

Is this a correct assumption? Do you need any other information in the schema?

theletterf commented 3 months ago

For settings, this is the info we gather using cfgschema:

https://github.com/splunk/collector-config-tools/blob/main/cfg-metadata/exporter/otlp.yaml

For metrics:

https://github.com/splunk/collector-config-tools/blob/main/metric-metadata/elasticsearchreceiver.yaml

You might want to take a look at the contraption we built for this:

https://github.com/splunk/collector-config-tools/tree/main/cfgschema

ptodev commented 3 months ago

Thanks, @theletterf! I will keep your requirements in mind.

I know the the Pkl PR is a bit disruptive, mostly due to the Pkl Duration type and the Pkl-specific validation rules. As a next step, I will try to propose something which doesn't disrupts the existing code so much. I may update the Pkl PR and/or revisit a solution such as the one that uses go-jsonschema.

I can only spend about 1 day a week on this project, so it might take me a few weeks to get back to you with updated proposals.

theletterf commented 3 months ago

Considering how long we've been chasing this, all activity is welcome. :)

tsloughter commented 6 days ago

agreed, but if there was a standard mechanism defined & used in Core it would be much more likely that components and contrib plugins would start using it to declare their configuration, improving the overall ecosystem.

Right. The SDK configuration file schema is the same way but still seen as a benefit to be able to validate what it can and hope that others will extend it.

open-telemetry / opentelemetry-collector

Improve otel collector configuration w/ JSON schema #9769