vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.22k stars 1.6k forks source link

Add `instruct` subcommand #2003

Closed ghost closed 4 years ago

ghost commented 4 years ago

In cases when there are many instances of Vector deployed, it might become more and more difficult to quickly update the configuration for all of these instances.

Presently, there are three main solutions to this:

  1. Automatically update configs on all of the machines using a DevOps tool. However, in a case of some of such tools, for example, Ansible, the update might take a long time in case if there are many hosts.
  2. Redeploy Vector together with its config as a Docker container, which requires a rebuilding of the container and then restarting all Vector containers to have the configuration updated.
  3. Mount the config on an NFS volume (or a similar solution provided by the cloud platform, such as EFS in AWS). This plays well with the automatic config reloading feature.

However, all of them either are not real-time, so that configuration changes propagate slowly (solutions 1 and 2), or are somewhat hard to setup (solution 3).

Proposal

So I propose to add two new modes to Vector, instructor and pupil, which would make it possible to configure many Vector instances using a single config file with instant propagation of the changes. These modes could interact in the following manner:

image

If live reloading of the config file is enabled for the instructor and the config changes or if it is restarted, the config is translated to the pupils, so that they are always in sync with the single config file provided to the instructor. Vector in instructor mode doesn't process the data itself.

Possible user interface

binarylogic commented 4 years ago

This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do? Another example is Kubernetes configuration management system. I could make the case that services like AWS' SSM Parameter Store might be better for some users as a central config store, but I wouldn't want to build in direct support for SSM in Vector, because the number of integrations there are endless.

ghost commented 4 years ago

This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do?

I see, these systems make it possible to distribute the configs automatically, and with Vector's live reloading feature this doesn't even require to restart Vector on config updates.

Actually, I came up with this idea as a first step to tackle the problem of scaling Vector horizontally when it acts in the role of the central service. Currently, if a single Vector instance doesn't have enough throughput to be in the role of the central service, there are three options:

  1. Use the stream-based architecture with Kafka between the agents and the Vector consumers. This works well, but if the throughput is high, the Kafka cluster might require a lot of resources (compared to Vector), and retention/log replay features might be not always needed.
  2. Put multiple Vector instances behind a load balancer. This has downsides of either increased complexity and resource utilization (or just costs if a hosted load balancer, such as AWS ELB, is used).
  3. Use a service discovery tool, such as linkerd or AWS AppMesh. This would do the job, but requires adding some additional complexity in case if it is not already used for other purposes.

So after adding the instructor/pupil mode, it could have been possible to build support for service discovery right into Vector source/sink protocol. Thus, it could have worked for example like this:

As I understand, this can be done with an external tool like linkerd. But, like in the case of the built-in centralized config management, I wonder does it make sense for Vector to gradually embrace these kinds of common functionality to have less assumptions about the other components of the infrastructure (so if one learns how to scale Vector on Kubernetes, the same knowledge can be applied without much modification on AWS and vice versa).

Hoverbear commented 4 years ago

I worry this unnecessarily expands Vector into a distributed system. Notably it subverts the role of some nodes to give them a control plane role.

Perhaps this is a task a different project should accomplish?

Hoverbear commented 4 years ago

Just noting this kind of change probably requires an RFC as its a pretty significant feature.

binarylogic commented 4 years ago

Thanks @a-rodin this was a great write up. It's likely we'll solve this in a different way as we exapnd other parts of Vector. We'll keep this here for reference though.