vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.92k stars 1.58k forks source link

Vector YAML configuration - anchors support #20853

Open nazarovkv opened 3 months ago

nazarovkv commented 3 months ago

A note for the community

Use Cases

During configuration of Vector pipeline in YAML format - I've found that vector does not "understand" anchors features in YAML syntax. I think it will be quite useful to define some templates for i.e. sinks and re-use it. Let say - I need to route message to same kafka cluster, but to different topics - providing full configuration for each sink with only difference in topic name - can result in vector's config becoming fat.

Attempted Solutions

No response

Proposal

Figure 1 - Vector config in YAML format with multiple sinks to same kafka cluster:

sinks:
  kafka-topic-1:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-1
    inputs:
      - my-router.topic-1
  kafka-topic-2:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-2
    inputs:
      - my-router.topic-2
  kafka-topic-3:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-3
    inputs:
      - my-router.topic-3

Figure 2 - same config as above but with anchors:

templates:
  kafka-sink-tpl: &kafka-sink
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json

sinks:
  kafka-topic-1:
    <<: *kafka-sink
    topic: topic-1
    inputs:
      - my-router.topic-1
  kafka-topic-2:
    <<: *kafka-sink
    topic: topic-2
    inputs:
      - my-router.topic-2
  kafka-topic-3:
    <<: *kafka-sink
    topic: topic-3
    inputs:
      - my-router.topic-3

Feel the difference.

References

No response

Version

No response

nazarovkv commented 3 months ago

Close pls. Found solution. https://vector.dev/docs/reference/configuration/template-syntax/

frankh commented 3 months ago

templating is clunky at best for this kind of thing, i think this should be re opened

jszwedko commented 3 months ago

Agreed, anchor support would be useful. Let me reopen this. Unfortunately I also don't expect it to happen anytime soon because the issue on the YAML parsing library that we use has been open since 2022: https://github.com/dtolnay/serde-yaml/issues/317 (and the maintainer has stepped away from that project).

jszwedko commented 3 months ago

As a workaround, the configuration file could be pre-processed to expand the anchors. For example, yq can do this via cat vector.yaml | yq 'explode(.)'.

nazarovkv commented 2 months ago

As a workaround, the configuration file could be pre-processed to expand the anchors. For example, yq can do this via cat vector.yaml | yq 'explode(.)'.

Hi @jszwedko, it can be useful right, but need to take one moment into account: Let say, templates defined under templates key(just like a reported)

templates:
  kafka-sink-tpl: &kafka-sink
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json

sinks:
  kafka-topic-1:
    <<: *kafka-sink
    topic: topic-1
    inputs:
      - my-router.topic-1
  kafka-topic-2:
    <<: *kafka-sink
    topic: topic-2
    inputs:
      - my-router.topic-2
  kafka-topic-3:
    <<: *kafka-sink
    topic: topic-3
    inputs:
      - my-router.topic-3

If I do cat vector.yaml | yq 'explode(.)' it results in:

templates:
  kafka-sink-tpl:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
sinks:
  kafka-topic-1:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-1
    inputs:
      - my-router.topic-1
  kafka-topic-2:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-2
    inputs:
      - my-router.topic-2
  kafka-topic-3:
    type: kafka
    bootstrap_servers: bootstrap-server-1.example.org, bootstrap-server-2.example.org, bootstrap-server-3.example.org
    sasl:
      enabled: true
      mechanism: "PLAIN"
      username: "kafka-username"
      password: "kafka-user-password"
    encoding:
      codec: json
    topic: topic-3
    inputs:
      - my-router.topic-3

So that templates key is kept in resulting config and hence vector will not start since vector knows nothing about templates key in vector.yaml. To overcome this - need to run cat vector.yaml | yq 'explode(.), del(.templates)' - to remove unused key from vector.yaml

But anyway - I think original request will be useful, I still need this feature in some cases.😅

jszwedko commented 2 months ago

I think that behavior is true, that templates would be in the resulting config, regardless of whether Vector handles the anchors or you preprocess with yq. It sounds like you what you want is for a way to have a dummy key that Vector ignores that you could use for declaring YAML aliases. I could see that being useful.

nazarovkv commented 2 months ago

It sounds like you what you want is for a way to have a dummy key that Vector ignores that you could use for declaring YAML aliases. I could see that being useful.

Yeah, having such a special key for templating could solve the issue.