vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.32k stars 1.51k forks source link

outputs as an alternative to inputs #6652

Open karlseguin opened 3 years ago

karlseguin commented 3 years ago

I'm having a hard time organizing my configurations. I think I have a common scenario, so maybe there's an existing feature/pattern that I'm not familiar with that would help address this.

Basically, our deploys are made up of independent units, e.g. postgresql, nginx. We don't know, ahead of time, which unit is going to be deployed where.

One of our units is "vector". This is installed on every server. It installs vector. It also defines some common outputs in vector.toml

[transforms.error_count_metric]
  type = "log_to_metric"
[[transforms.error_count_metric.metrics]]
  type = "counter"
  field = "app"
  name = "errors"
  tags.app = "{{app}}"
  tags.env = "{{env}}"

[sinks.elasticsearch_errors]
  type = "elasticsearch"
  index = "error-%Y-%m-%d"
  ...

[sinks.prometheus]
  type = "prometheus_exporter"

Ideally, my nginx and postgresql units would be self-contained. Which is to say that I'd like my vector configuration for these units to exist within these units. So, for nginx, we could imagine:

[sources.nginx_errors]
  type = "file"
  include = ["/data/log/nginx.error"]

[sources.nginx_access]
  type = "file"
  include = ["/data/log/nginx.access"]

[sinks.nginx_access_elastic]
  inputs = ["nginx_access"]
  type = "elasticsearch"
  index = "access-%Y-%m-%d"

So you can see that the nginx_access logs gets sinked to via nginx_access_elastic. But how do I get nginx_errors to get sinked via elasticsearch_errors.

Currently, I believe the the only option is for the nginx unit to sed vector.toml and inject itself into the elasticsearch_errors input list (which is what we do). What I'm proposing is that nginx_errors could be given anoutputs = ["elasticsearch_errors"] value.

6170 might help a little with this, but at best it'll be messy, at worse, it won't work. For example, I might want nginx_error to go to both elasticsearch_errors and prometheus_error (via a the error_count_metric transform), but only want postgresql_error to go to elasticsearch_errors. I don't think i can name these in such a way that wildcard inputs will work.

jszwedko commented 3 years ago

Thanks for this write-up @karlseguin . We've been thinking a little about automatic discovery features for Vector and this is definitely a good example.

One other way I could see modeling a solution to this particular problem would be to be able to attach labels to components and use those labels as a selector for the inputs to other components.

Something like:

[sources.nginx_errors]
  type = "file"
  labels = ["error"]
  include = ["/data/log/nginx.error"]

[transforms.error_count_metric]
  type = "log_to_metric"
[[transforms.error_count_metric.metrics]]
  type = "counter"
  field = "app"
  name = "errors"
  tags.app = "{{app}}"
  tags.env = "{{env}}"

[sinks.elasticsearch_errors]
  type = "elasticsearch"
  inputs = ["labels.error"]
  index = "error-%Y-%m-%d"
  ...

[sinks.prometheus]
  type = "prometheus_exporter"