grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.28k stars 172 forks source link

Parsable Metadata File for Component Definitions #1563

Open wildum opened 2 weeks ago

wildum commented 2 weeks ago

Background

A parsable file representing Alloy components would enable the creation of config tools such as autocompletion or visual scripting.

The concept is very similar to the one from @thampiotr that was abandoned: https://github.com/grafana/agent/pull/5863

This proposal focuses on the generated file. Another proposal will address how the file will be generated if this proposal is accepted.

Proposal

I propose to have a generated JSON file representing Alloy components that exposes the following data:

Version: string

Component = {
    name: string,
    description: string,
    stability: string,
    community: boolean,
    requireLabel: boolean,
    arguments: Array<Argument>,
    exports: Array<Export>,
    blocks: Array<Block>
}

Argument = {
    name: string,
    type: string,
    description: string,
    required: boolean,
    default: any
}

Export = {
    name: string,
    type: string,
    description: string
}

Block = {
    name: string,
    description: string,
    required: boolean,
    unique: boolean,
    arguments: Array<Argument>,
    blocks: Array<Block>
}

I tested with a generator based on Alloy documentation and the generated JSON file was 1.1MB (33k lines). I picked JSON for its parsing speed, memory efficiency, and native support in JS (I expect most tools like autocomplete and visual scripting to use JS).

The file will be generated via the make generate command. Tests will be added to check that the file is up to date in a similar way as the related components section generation. The file will be available directly in the GitHub repo for users to download (they can download the different versions on the release branches).

The file could additionally be generated via the command line with the Alloy binary.

All components and config blocks should be described in the file.

mattdurham commented 2 weeks ago

Can you post up a snippet of some of the more complex components, windows_exporter has to have one of the largest configs?

wildum commented 2 weeks ago
prometheus.exporter.windows ```json "prometheus.exporter.windows": { "name": "prometheus.exporter.windows", "doc": "`prometheus.exporter.windows` component embeds\n[windows_exporter](https://github.com/prometheus-community/windows_exporter) which exposes a\nwide variety of hardware and OS metrics for Windows-based systems.", "arguments": [ { "name": "enabled_collectors", "type": "list(string)", "doc": "List of collectors to enable.", "required": false, "default": "[\"cpu\",\"cs\",\"logical_disk\",\"net\",\"os\",\"service\",\"system\"]" }, { "name": "timeout", "type": "duration", "doc": "Configure timeout for collecting metrics.", "required": false, "default": "4m" } ], "requireLabel": true, "exports": [ { "name": "targets", "type": "list(map(string))", "doc": "The targets that can be used to collect exporter metrics." } ], "blocks": [ { "name": "dfsr", "doc": "Configures the dfsr collector.", "required": false, "arguments": [ { "name": "source_enabled", "type": "list(string)", "doc": "Comma-separated list of DFSR Perflib sources to use.", "required": false, "default": "[\"connection\",\"folder\",\"volume\"]" } ], "blocks": [] }, { "name": "exchange", "doc": "Configures the exchange collector.", "required": false, "arguments": [ { "name": "enabled_list", "type": "string", "doc": "Comma-separated list of collectors to use.", "required": false, "default": "\"\"" } ], "blocks": [] }, { "name": "iis", "doc": "Configures the iis collector.", "required": false, "arguments": [ { "name": "app_exclude", "type": "string", "doc": "Regular expression of applications to ignore.", "required": false, "default": "\"\"" }, { "name": "app_include", "type": "string", "doc": "Regular expression of applications to report on.", "required": false, "default": "\".*\"" }, { "name": "site_exclude", "type": "string", "doc": "Regular expression of sites to ignore.", "required": false, "default": "\"\"" }, { "name": "site_include", "type": "string", "doc": "Regular expression of sites to report on.", "required": false, "default": "\".*\"" } ], "blocks": [] }, { "name": "logical_disk", "doc": "Configures the logical_disk collector.", "required": false, "arguments": [ { "name": "exclude", "type": "string", "doc": "Regular expression of volumes to exclude.", "required": false, "default": "\"\"" }, { "name": "include", "type": "string", "doc": "Regular expression of volumes to include.", "required": false, "default": "\".+\"" } ], "blocks": [] }, { "name": "msmq", "doc": "Configures the msmq collector.", "required": false, "arguments": [ { "name": "where_clause", "type": "string", "doc": "WQL 'where' clause to use in WMI metrics query.", "required": false, "default": "\"\"" } ], "blocks": [] }, { "name": "mssql", "doc": "Configures the mssql collector.", "required": false, "arguments": [ { "name": "enabled_classes", "type": "list(string)", "doc": "Comma-separated list of MSSQL WMI classes to use.", "required": false, "default": "[\"accessmethods\", \"availreplica\", \"bufman\", \"databases\", \"dbreplica\", \"genstats\", \"locks\", \"memmgr\", \"sqlstats\", \"sqlerrors\", \"transactions\"]" } ], "blocks": [] }, { "name": "network", "doc": "Configures the network collector.", "required": false, "arguments": [ { "name": "exclude", "type": "string", "doc": "Regular expression of NIC:s to exclude.", "required": false, "default": "\"\"" }, { "name": "include", "type": "string", "doc": "Regular expression of NIC:s to include.", "required": false, "default": "\".*\"" } ], "blocks": [] }, { "name": "process", "doc": "Configures the process collector.", "required": false, "arguments": [ { "name": "exclude", "type": "string", "doc": "Regular expression of processes to exclude.", "required": false, "default": "\"\"" }, { "name": "include", "type": "string", "doc": "Regular expression of processes to include.", "required": false, "default": "\".*\"" } ], "blocks": [] }, { "name": "scheduled_task", "doc": "Configures the scheduled_task collector.", "required": false, "arguments": [ { "name": "exclude", "type": "string", "doc": "Regexp of tasks to exclude.", "required": false, "default": "\"\"" }, { "name": "include", "type": "string", "doc": "Regexp of tasks to include.", "required": false, "default": "\".+\"" } ], "blocks": [] }, { "name": "service", "doc": "Configures the service collector.", "required": false, "arguments": [ { "name": "use_api", "type": "string", "doc": "Use API calls to collect service data instead of WMI.", "required": false, "default": "false" }, { "name": "where_clause", "type": "string", "doc": "WQL 'where' clause to use in WMI metrics query.", "required": false, "default": "\"\"" } ], "blocks": [] }, { "name": "smtp", "doc": "Configures the smtp collector.", "required": false, "arguments": [ { "name": "exclude", "type": "string", "doc": "Regexp of virtual servers to ignore.", "required": false, "default": null }, { "name": "include", "type": "string", "doc": "Regexp of virtual servers to include.", "required": false, "default": "\".+\"" } ], "blocks": [] }, { "name": "text_file", "doc": "Configures the text_file collector.", "required": false, "arguments": [ { "name": "text_file_directory", "type": "string", "doc": "The directory containing the files to be ingested.", "required": false, "default": "C:\\Program Files\\GrafanaLabs\\Alloy\\textfile_inputs" } ], "blocks": [] } ] }, ```
wildum commented 2 weeks ago

This generated version does not contain the "unique" field for blocks. I don't know if it's something we should add or not in the format. It's not useful for autocompletion but it could be useful for validation and visual scripting

ptodev commented 2 weeks ago

I'm happy with this, and I'm very excited to see how we can use it for autogenerating docs. I wonder how we are going to treat pointer attributes and blocks. Atm, if a block is a pointer, then its defaults won't actually be set. They'll only be set if it's not a pointer. We don't document this very clearly, and this is a change to iron out such issues in the docs.

ptodev commented 2 weeks ago

It might be helpful if we make our schema look like a standard json schema, but it's not a must.

ptodev commented 2 weeks ago

One of the main issues with the current proposal is that it doesn't include any sort of validation for config attributes. This could reduce the usefulness of the schema for validating configs. Should we include such a feature?

For example, in Json schema there are number validations such as "multipleOf" : 0.01, "minimum": 0, etc.

wildum commented 2 weeks ago

One of the main issues with the current proposal is that it doesn't include any sort of validation for config attributes. This could reduce the usefulness of the schema for validating configs. Should we include such a feature?

That's a good point, but I think that for almost all numbers the constraints are obvious (fe: timeout should not be negative). Where the constraints might help the most is for strings when only specific values are allowed (fe: the role in discovery.dockerswarm must be either "services", "nodes" or "tasks").

Not sure if this should be part of the first version

wildum commented 2 weeks ago

It might be helpful if we make our schema look like a standard json schema, but it's not a must.

Would you then use it to check if the generated components file is correct according to the schema? It's an interesting idea but I think that I would only use it for testing or would you use it differently?

wildum commented 2 weeks ago

I wonder how we are going to treat pointer attributes and blocks. Atm, if a block is a pointer, then its defaults won't actually be set. They'll only be set if it's not a pointer. We don't document this very clearly, and this is a change to iron out such issues in the docs.

I think that this does not apply to this file, we should treat pointers the same way as values for attributes and blocks because there is no difference between the two in the config. The generated file should be used for config tools. The underlying logic such as how the defaults are applied for pointers should be described in the documentation. If we generate the doc, there could be a warning for the pointer types but I wouldn't include it in this file

thampiotr commented 2 weeks ago

Couple of points:

  1. Can we add a stability level to the components? Also whether they are community components would also be a good idea IMO.
  2. You mention that "the generation will be part of the release process" - can we instead make it part of the make generate and have a test that will fail if the generated JSON is not up-to-date? This would be similar to the related components section generation in the docs right now.
wildum commented 2 weeks ago

@thampiotr thanks for the suggestions, added everything

ptodev commented 1 week ago

Actually, I think we should evaluate json schema in a bit more detail first. Last week I OK'd the proposal as it is, because I felt pessimistic that upstream's schematisation project will complete soon due to technical difficulties with generating code that's not too dissimilar. But I've since been able to overcome a few hurdles withe the code, and now I feel much more optimistic. Upstream's maintainers are also receptive to having the feature.

Would you then use it to check if the generated components file is correct according to the schema? It's an interesting idea but I think that I would only use it for testing or would you use it differently?

Using json schema has a few advantages:

It would be nice if we can explore further whether json-schema is not sufficient for our purposes, and if not, to know why.

Note that it's also possible to extend the schema with parameters which are not in the spec. It's a common problem, and it's likely that OTel will do this upstream to support "secret" strings:

password:
  type: string
  alias: configopaque.String
wildum commented 1 week ago

I tried generating a JSON Schema from the component example that I put above. Is this what you have in mind?

Schema ```json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "prometheus.exporter.windows": { "type": "object", "properties": { "name": { "type": "string", "const": "prometheus.exporter.windows" }, "doc": { "type": "string" }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "enabled_collectors" }, "type": { "const": "list(string)" }, "doc": { "const": "List of collectors to enable." }, "required": { "const": false }, "default": { "const": "[\"cpu\",\"cs\",\"logical_disk\",\"net\",\"os\",\"service\",\"system\"]" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "timeout" }, "type": { "const": "duration" }, "doc": { "const": "Configure timeout for collecting metrics." }, "required": { "const": false }, "default": { "const": "4m" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "requireLabel": { "type": "boolean", "const": true }, "exports": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "targets" }, "type": { "const": "list(map(string))" }, "doc": { "const": "The targets that can be used to collect exporter metrics." } }, "required": ["name", "type", "doc"] } ] }, "blocks": { "type": "array", "items": { "type": "object", "oneOf": [ { "$ref": "#/definitions/dfsr" }, { "$ref": "#/definitions/exchange" }, { "$ref": "#/definitions/iis" }, { "$ref": "#/definitions/logical_disk" }, { "$ref": "#/definitions/msmq" }, { "$ref": "#/definitions/mssql" }, { "$ref": "#/definitions/network" }, { "$ref": "#/definitions/process" }, { "$ref": "#/definitions/scheduled_task" }, { "$ref": "#/definitions/service" }, { "$ref": "#/definitions/smtp" }, { "$ref": "#/definitions/text_file" } ] } } }, "required": ["name", "doc", "arguments", "requireLabel", "exports", "blocks"] } }, "required": ["prometheus.exporter.windows"], "definitions": { "dfsr": { "type": "object", "properties": { "name": { "const": "dfsr" }, "doc": { "const": "Configures the dfsr collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "source_enabled" }, "type": { "const": "list(string)" }, "doc": { "const": "Comma-separated list of DFSR Perflib sources to use." }, "required": { "const": false }, "default": { "const": "[\"connection\",\"folder\",\"volume\"]" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "exchange": { "type": "object", "properties": { "name": { "const": "exchange" }, "doc": { "const": "Configures the exchange collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "enabled_list" }, "type": { "const": "string" }, "doc": { "const": "Comma-separated list of collectors to use." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "iis": { "type": "object", "properties": { "name": { "const": "iis" }, "doc": { "const": "Configures the iis collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "app_exclude" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of applications to ignore." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "app_include" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of applications to report on." }, "required": { "const": false }, "default": { "const": "\".*\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "site_exclude" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of sites to ignore." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "site_include" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of sites to report on." }, "required": { "const": false }, "default": { "const": "\".*\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "logical_disk": { "type": "object", "properties": { "name": { "const": "logical_disk" }, "doc": { "const": "Configures the logical_disk collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "exclude" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of volumes to exclude." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "include" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of volumes to include." }, "required": { "const": false }, "default": { "const": "\".+\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "msmq": { "type": "object", "properties": { "name": { "const": "msmq" }, "doc": { "const": "Configures the msmq collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "where_clause" }, "type": { "const": "string" }, "doc": { "const": "WQL 'where' clause to use in WMI metrics query." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "mssql": { "type": "object", "properties": { "name": { "const": "mssql" }, "doc": { "const": "Configures the mssql collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "enabled_classes" }, "type": { "const": "list(string)" }, "doc": { "const": "Comma-separated list of MSSQL WMI classes to use." }, "required": { "const": false }, "default": { "const": "[\"accessmethods\", \"availreplica\", \"bufman\", \"databases\", \"dbreplica\", \"genstats\", \"locks\", \"memmgr\", \"sqlstats\", \"sqlerrors\", \"transactions\"]" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "network": { "type": "object", "properties": { "name": { "const": "network" }, "doc": { "const": "Configures the network collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "exclude" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of NIC:s to exclude." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "include" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of NIC:s to include." }, "required": { "const": false }, "default": { "const": "\".*\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "process": { "type": "object", "properties": { "name": { "const": "process" }, "doc": { "const": "Configures the process collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "exclude" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of processes to exclude." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "include" }, "type": { "const": "string" }, "doc": { "const": "Regular expression of processes to include." }, "required": { "const": false }, "default": { "const": "\".*\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, "scheduled_task": { "type": "object", "properties": { "name": { "const": "scheduled_task" }, "doc": { "const": "Configures the scheduled_task collector." }, "required": { "const": false }, "arguments": { "type": "array", "items": [ { "type": "object", "properties": { "name": { "const": "exclude" }, "type": { "const": "string" }, "doc": { "const": "Regexp of tasks to exclude." }, "required": { "const": false }, "default": { "const": "\"\"" } }, "required": ["name", "type", "doc", "required", "default"] }, { "type": "object", "properties": { "name": { "const": "include" }, "type": { "const": "string" }, "doc": { "const": "Regexp of tasks to include." }, "required": { "const": false }, "default": { "const": "\".+\"" } }, "required": ["name", "type", "doc", "required", "default"] } ] }, "blocks": { "type": "array", "items": {} } }, "required": ["name", "doc", "required", "arguments", "blocks"] }, } ```
mattdurham commented 1 week ago

Would be for using the json schema, its a bit wordy but this should in general be tools consuming this and not directly by people.

ptodev commented 1 week ago

Is this what you have in mind?

@wildum More or less yes. An Alloy block would have to be an object. I'm not sure if the example in your comment is exactly how the schema needs to be, but it certainly looks like a good start.

wildum commented 4 days ago

I'm not fully bought on the JSON schema idea. The schemas are used to validate files but this is not what we are looking for in this proposal: we want to provide components metadata to populate config tools.

Although it provides some structure to the JSON file, I feel like we are misusing the concept, making it confusing. As a config tools builder, I would prefer to have a simple interface description such as the one defined in the proposal. It's easy to implement in any language and does not require additional knowledge.

Even if the file should be consumed by tools and not people, it's still useful sometimes to check through the data for debugging purposes and the JSON schema style is harder to read than the other style.

I think that validation is interesting but it's a big topic with a lot of unknowns. I would prefer to keep it out of scope for this proposal.

I don't necessarily want to close the door on the JSON schema. If we go all the way with JSON schema in Otel and Alloy, then it might be worth considering here.

So I suggest that with this proposal we only commit to the fact that we want to expose a generated metadata JSON file containing the component data that is described in the PR, in the Git repo. The structure of the data (whether it follows the JSON schema semantic or not) will be defined when we start discussing how we want to generate the data.