vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.28k stars 1.61k forks source link

Optional inputs for transforms and sinks #21872

Open tmccombs opened 6 days ago

tmccombs commented 6 days ago

A note for the community

Use Cases

I have vector configuration that is shared across multiple instances. In particular it includes a sink configuration.

In some instances I want to add an additional source, and some transforms, by including another yaml file.

However, that poses the problem: how do I include the final transform in the inputs of the sink in the shared configuration file?

Attempted Solutions

Simply including the name of the transform in the inputs doesn't work, because then I get an error that Input "x" for sink "y" doesn't match any components. If I try using a wildcard that would only match the new transform, I get similar error.

What does work is using an environment variable to specify additional inputs to use, but then you have to specify that environment variable in addition to the additional file. And it just feels kind of hacky.

Another option is to use a wildcard that also matches something that is always available. But that may or may not be reasonable to do, depending on what the other inputs are. In particular, it may be difficult to do if the only shared input is a route transform, and the added input isn't.

You could use some kind of programmatic generation to generate the config, so that you can optionally include the desired components and inputs, but that would add significant complexity if you don't already need that.

You could use some kind of dummy source that doesn't produce any inputs, but there isn't a clear way to do that either. And even if there was, that means you would need to have a separate file for the cases that don't need the extra components.

Proposal

I can think of a few ways that this could be addressed:

  1. Add a field on sinks and transforms for optional inputs that doesn't produce an error if the input component doesn't exist
  2. Change wildcards to not produce an error if they don't match anything
  3. Have a way to specify on a source or transform that its output should be sent to a different transform or sink. Like an inverse of inputs.
  4. Have a way to set environment variables (or just variables) in one file, that can be referenced in another, so that you can add a file, and automatically set an environment variable that can be used in inputs for other components.

References

Version

vector 0.42.0 (x86_64-unknown-linux-gnu 3d16e34 2024-10-21 14:10:14.375255220)

pront commented 3 days ago

Hi @tmccombs, this is a reasonable idea. IMO proposal(2) makes sense.

My only concern is that someone might depend on existing behavior i.e. the process to stop if there is no match. The proper way to do this is introduce an opt-in global option to relax the wildcard matching. Optionally, we would make this the default behavior but we would have to go through a deprecation phase first.

If the new option is ON and there's no match, we should log a user friendly warning e.g. like this.

P.S.

Have a way to specify on a source or transform that its output should be sent to a different transform or sink. Like an inverse of inputs.

Ideally I would like all components (sources, transforms, sinks) to be just a node in the pipeline graph. But that is not directly related to this issue.