Project tooling to generate example property descriptions

jack-berg commented 4 months ago

Problem

This project maintains several examples. These serve multiple purposes

Validate correctness of schema changes
Serve as templates for users
Document the semantics about configuration properties for both users who set them, and maintainers who implement the interpretation of these properties.

The last point is poorly served. The comments on these examples contain important information, like the priority between .attribute_limits and .tracer_provider.limits, the set of recognized values for samplers, the set of recognized values for view stream aggregations, and more. Trying to ensure that comments are maintained consistently across examples is time consuming and error prone.

Solution

This PR introduces new project tooling to generate the comments consistently across example files.

How it works:

The ./schema/type_descriptions.yaml file defines descriptions for each of the properties of each type defines in the ./schema data model.
The ./scripts/generate-comments.js is a script which for a given input configuration file will:
- Parse the YAML.
- Walk through each key / value pair, and for each:
- Compute the JSON dot notation location of the current key / value pair.
- Find the first matching rule in type_description.yaml. Iterate through the rules and evaluate the key / value pair dot notation location against each of the rule's path_patterns.
- Inject / overwrite comments for its properties according to type_descriptions.yaml.
- Write the resulting content to specified output file or to the console.

The make generate-descriptions command runs this process against each file in ./examples and overwrites each file in the process.

Alternatives considered

Can't we just use the JSON schema description field to encode this information?

Not easily, and with some drawbacks. In this PR I use a yaml parsing library to visit each key / value pair, evaluate its comment, and update the yaml parsing libraries in-memory representation of the comment for that pair. If the description information was encoded in the JSON schema, I'd need to combine capabilities from two libraries: one js library for parsing / validating yaml according to the JSON schema, and another js library for visiting each node and updating representation. The JSON schema parsing library doesn't provide tools for mutating the state of the YAML comments. I don't see an obvious way to combine the necessary functionality from each of these libraries.

Additionally, JSON schema is written in JSON, which has poor support for the types of multi-line comments which are going to be common.

This approach has the drawback that the knowledge about a particular type is distributed across two places: the .json JSON schema file and the type_descriptions.yaml file. But in exchange for this, we get a simple script for tooling, and its ready to use today 🙂 .

jack-berg commented 4 months ago

Per our convo in the 7/22/24 config SIG, I've pushed a commit to run make generate-descriptions and check for diff in the build.

jack-berg commented 2 months ago

I've resolved merge conflicts. @open-telemetry/configuration-approvers please take another look. Im happy to iterate as needed, but we do need tooling to enforce consistent property descriptions.

open-telemetry / opentelemetry-configuration