open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.37k forks source link

Automate reference documentation as YAML or JSON #24189

Closed theletterf closed 9 months ago

theletterf commented 1 year ago

Reference documentation for each component is hard to come by, update, and produce. Settings and metrics are, by far, the user-facing elements that might change more often between releases. This poses significant overhead on anyone trying to keep documentation up-to-date both upstream and downstream.

As suggested, hinted, or tried in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/23054, https://github.com/open-telemetry/opentelemetry-collector/pull/7679, https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20509, https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/15233, or https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/22962, I think we could generate reference documentation for each component as YAML files, using mdatagen (?) and pulling description and other information from the code.

The YAML reference could then be used to generate Markdown files or even the README file itself. It could also be used downstream by distributions, thus faithfully passing on documentation on each component. Elements that could be automated include:

An example is what Splunk has been doing here: https://github.com/splunk/collector-config-tools/tree/main/cfg-metadata

atoulme commented 1 year ago

As it stands, I think we are in a slow process to automate whatever parts of components we can generate using templates. Ideally, we would do the following:

Ideally, with this yaml fragment, we can drive the config.go file, and the README, to contain a complete table of all config options. This comment is the closest to have with latest progress.

theletterf commented 1 year ago

That sounds great, @atoulme. Would this apply to metrics as well?

For settings, I was thinking of a schema like this:

name: <name_of_entity>
fields:
- name: <field_name>
  value_type: <data_type>
  default: <default_value>
  description: |
    <description>

For metrics, thinking also of APM instrumentations, it'd be something like:

name: <name_of_component_or_instrumentation>

metrics:
   <name_of_metric>:
     status: <default|custom|arbitrary_vendor_value>
     enabled: <true|false>
     type: <sum|gauge|counter|histogram|others>
       value_type: <int|string|...>
       monotonic: <true|false>
       aggregation: <cumulative|...>
     unit: <unit_of_measurement>
     description: <metric_description>
     attributes: [list_of_attributes]

dimensions:
   <name_of_dimension>:
      description: <description>
      properties:

resource_attributes:
  <name_resource_attribute>:
    description: <description>
    enabled: <true|false>
    value_type: <data_type>

attributes:
  <name_attribute>:
    description: <description>
    value_type: <data_type>
    enum: [possible_values_list]
mx-psi commented 1 year ago

It's unclear to me how to model the value_type. In principle, the value type may be any Go type at all with appropriate marshal and unmarshal functions. For example, how would we model the following?

It's also unclear to me how do we model Validate and Unmarshal while allowing for code generation. Do we generate a struct that is embedded in the actual configuration?

kevinslin commented 1 year ago

I recently used mdatagen to create jsonschemas of all the otel components in order to add intellisense support for otel configs in vscode.

One limitations of the current generated metadata is that it does not capture whether certain fields are optional or required. It also is unable to handle custom validation logic.

On the validation front - because ConfigValidator can contain arbitrary go code, its infeasible to convert it to a more declarative representation like jsonschema.

Apologies if this has been brought up in the past but curious if there was discussion in flipping the order and authoring configuration in something like jsonschema and generating go code based on it? Json schema seems to cover most of the component configurations I've seen in the wild, including the validations. If components really need arbitrary validation, that can still be kept as an escape hatch and documented in the jsonschema.

theletterf commented 1 year ago

Trying to galvanize this one a bit more. @chalin WDYT would be required to steer this initiative forward? Is there anything the docs SIG could do here?

kevinslin commented 1 year ago

I've done this sort of work in past projects (jsonschema -> code). If we want to move forward with this, happy to step in

theletterf commented 1 year ago

@kevinslin That sounds fantastic. What do you suggest?

kevinslin commented 1 year ago

High level flow:

In terms of arbitrary validation rules:

Ideally, we'd keep the exact same interfaces as exist today. Just generated via jsonschema instead of manually written.

Recently discovered that the SDK team is starting to do similar work for generating SDK configuration from jsonschema. see https://github.com/open-telemetry/opentelemetry-go-contrib/pull/4228 and https://github.com/open-telemetry/opentelemetry-java/pull/5399

mx-psi commented 1 year ago

@kevinslin would be great if you can bring this up to one of the Collector SIG meetings (see here for the current times). Just add it to the agenda of a meeting you can attend and we can help you discuss the plan to ensure we are all aligned.

I am not familiar enough with jsonschema to answer this but I would like to see the questions I asked on https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/24189#issuecomment-1634193595 resolved before moving forward to avoid being blocked mid-way

codeboten commented 1 year ago

I think there was some effort somewhat along the line of this in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13384

djaglowski commented 1 year ago

It's also unclear to me how do we model Validate and Unmarshal while allowing for code generation. Do we generate a struct that is embedded in the actual configuration?

As a data point, it appears we have roughly 20 custom unmarshal functions.

dashpole commented 1 year ago

One other note: Some components have configuration structs that live in another repository. E.g.:

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/56ddbc37dfb1ffc4688fbd0ba31ca9d0ceceb7d2/exporter/googlecloudexporter/config.go#L6-L15

bryan-aguilar commented 1 year ago

High level flow:

* create jsonschemas for all existing components (using `mdatagen` and some manual work)

* create a spec and CLI tool for converting jsonschema to go config

at this point, minus custom validation logic, the jsonschema should generate the same config code that is manually written today

* version both the jsonschema and the generated structs

* update documentation that go over the new authoring process

In terms of arbitrary validation rules:

* for simple validation rules, convert them to json schema (eg. check if field exists or some simple comparison option)

* for complex validation that cannot be encoded in jsonschema, have an extra boolean field (eg. `dynamicValidation: true`) that indicates this

Ideally, we'd keep the exact same interfaces as exist today. Just generated via jsonschema instead of manually written.

Recently discovered that the SDK team is starting to do similar work for generating SDK configuration from jsonschema. see open-telemetry/opentelemetry-go-contrib#4228 and open-telemetry/opentelemetry-java#5399

I was thinking about this a bit more after the collector sig this morning. By chance I was listening to a podcast that had some talking points on markup languages at the end and it made me wonder.

Do we really care what markup language is format is used to generate the config struct and thus the documentation? Do we only have to pick one, whether it be jsonschema, yaml (plz no)? There are other markup languages that exist, toml or cue, for example that could also work.

Do we want to take a higher level approach to this problem and instead make the source of truth configurable? The source of truth should be owned by the codeowners. As long as source of truth format -> code is possible do we need to standardize? Or can we choose a solution that looks something like the confmap provider package and allows us to translate from one format to another?

For example, component owner A wants to use JsonSchema and component owner b wants to use cue. It's the format they and their team are most familiar with. Should we cater to that scenario?

Ignoring my big comment that just asks a lot of questions and doesn't propose much other than a scope explosion to a small problem....I do like jsonschema and think it can be quite powerful.

kevinslin commented 1 year ago

Created an initial proposal for authoring component configuration using declarative schema. Would love to get any feedback, either in this issue or on the doc šŸ™

ā˜ļø @bryan-aguilar @dashpole @djaglowski @codeboten @mx-psi @theletterf

theletterf commented 1 year ago

@kevinslin I love it. Just curious: how hard would it be to extend this model to trace instrumentation and other projects?

mx-psi commented 1 year ago

@kevinslin I love it. Just curious: how hard would it be to extend this model to trace instrumentation and other projects?

@theletterf See https://github.com/open-telemetry/opentelemetry-configuration for that :)

theletterf commented 1 year ago

@mx-psi Looks great, but how would Kevin's proposal intersect with the above? Would it be building upon it? Would then be the responsibility of each project maintainer to adopt those conventions and generate the files?

mx-psi commented 1 year ago

Looks great, but how would Kevin's proposal intersect with the above?

AIUI these are independent:

The only point of overlap when it comes to the Collector is configuring the Collector's own telemetry. We are working on that on open-telemetry/opentelemetry-collector/issues/7532

theletterf commented 1 year ago

Stake for tech writers and documentarians collaborating with the OTel projects is having a mostly similar mechanisms for producing and consuming reference documentation. Despite the differences in usage scenarios and stack, I think it'd be great to align as much as possible in the way the output is produced and presentedā€”settings and metrics can be described in very similar ways after all. Not sure if my concern is clear enough though?

mx-psi commented 1 year ago

Stake for tech writers and documentarians collaborating with the OTel projects is having a mostly similar mechanisms for producing and consuming reference documentation.

This makes sense. @codeboten is involved both in the Configuration WG and in the Collector so we can use his input to ensure that we are aligned in that sense.

I think it'd be great to align as much as possible in the way the output is produced and presentedā€”settings and metrics can be described in very similar ways after all. Not sure if my concern is clear enough though?

Not sure I understand this second part. Is this about what way to express things in jsonschema given several options to do so? Is this about having a similar configuration schema for configuration sections configuring the same thing?

theletterf commented 1 year ago

Not sure I understand this second part. Is this about what way to express things in jsonschema given several options to do so? Is this about having a similar configuration schema for configuration sections configuring the same thing?

More like the second. For example, settings: whether they are Collector components's settings or instrumentation settings, they could share the same structure with few non-overlapping extensions:

name: <name_of_entity>
fields:
- name: <field_name>
  value_type: <data_type>
  default: <default_value>
  description: |
    <description>
kevinslin commented 1 year ago

created an initial pr to go over some additional design decisions that have come up while doing the implementation > https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/27003

github-actions[bot] commented 11 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 9 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.