Add `instruct` subcommand

ghost commented 4 years ago

In cases when there are many instances of Vector deployed, it might become more and more difficult to quickly update the configuration for all of these instances.

Presently, there are three main solutions to this:

Automatically update configs on all of the machines using a DevOps tool. However, in a case of some of such tools, for example, Ansible, the update might take a long time in case if there are many hosts.
Redeploy Vector together with its config as a Docker container, which requires a rebuilding of the container and then restarting all Vector containers to have the configuration updated.
Mount the config on an NFS volume (or a similar solution provided by the cloud platform, such as EFS in AWS). This plays well with the automatic config reloading feature.

However, all of them either are not real-time, so that configuration changes propagate slowly (solutions 1 and 2), or are somewhat hard to setup (solution 3).

Proposal

So I propose to add two new modes to Vector, instructor and pupil, which would make it possible to configure many Vector instances using a single config file with instant propagation of the changes. These modes could interact in the following manner:

Instructor takes a configuration file, validates it, and waits for the pupils to connect on a TCP socket.
Pupils connect to the instructor and receive configuration specified in the config file provided to the instructor.

If live reloading of the config file is enabled for the instructor and the config changes or if it is restarted, the config is translated to the pupils, so that they are always in sync with the single config file provided to the instructor. Vector in instructor mode doesn't process the data itself.

Possible user interface

The pupils could be started as normal Vector instances, using vector --config ... command. However, the argument provided to the --config option would be a URI of form vector://instructor-host.
A pupil could accept multiple instructor hosts, so that if the first instructor is not available, it would ask the second, and so on. Multiple instructs might be specified by providing a URI looking like vector://instructor-1:1234,instructor-2:3456,instructor-3:7890.
The instructor mode could be initiated using instructor subcommand together with usual commands for specifying config files, for example: vector --config vector.toml instruct.

binarylogic commented 4 years ago

This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do? Another example is Kubernetes configuration management system. I could make the case that services like AWS' SSM Parameter Store might be better for some users as a central config store, but I wouldn't want to build in direct support for SSM in Vector, because the number of integrations there are endless.

ghost commented 4 years ago

This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do?

I see, these systems make it possible to distribute the configs automatically, and with Vector's live reloading feature this doesn't even require to restart Vector on config updates.

Actually, I came up with this idea as a first step to tackle the problem of scaling Vector horizontally when it acts in the role of the central service. Currently, if a single Vector instance doesn't have enough throughput to be in the role of the central service, there are three options:

Use the stream-based architecture with Kafka between the agents and the Vector consumers. This works well, but if the throughput is high, the Kafka cluster might require a lot of resources (compared to Vector), and retention/log replay features might be not always needed.
Put multiple Vector instances behind a load balancer. This has downsides of either increased complexity and resource utilization (or just costs if a hosted load balancer, such as AWS ELB, is used).
Use a service discovery tool, such as linkerd or AWS AppMesh. This would do the job, but requires adding some additional complexity in case if it is not already used for other purposes.

So after adding the instructor/pupil mode, it could have been possible to build support for service discovery right into Vector source/sink protocol. Thus, it could have worked for example like this:

Vector in the agent mode, which sits near the services it collects the observability data from, is configured to send the collected data to a centralized Vector sink.
The centralized Vector sink is not a standalone Vector instance, but instead a Vector instance in the instructor mode.
When the instructor receives connections from the agents, it responds with IP address of one if its pupils (the least busy one, as the instructor can gather statistics from the pupils to determine how busy each of them is) and then the agent sends the data to this pupil.
If the pupil goes offline, the agent can ask the instructor to provide an address of another pupil.

As I understand, this can be done with an external tool like linkerd. But, like in the case of the built-in centralized config management, I wonder does it make sense for Vector to gradually embrace these kinds of common functionality to have less assumptions about the other components of the infrastructure (so if one learns how to scale Vector on Kubernetes, the same knowledge can be applied without much modification on AWS and vice versa).

Hoverbear commented 4 years ago

I worry this unnecessarily expands Vector into a distributed system. Notably it subverts the role of some nodes to give them a control plane role.

Perhaps this is a task a different project should accomplish?

Hoverbear commented 4 years ago

Just noting this kind of change probably requires an RFC as its a pretty significant feature.

binarylogic commented 4 years ago

Thanks @a-rodin this was a great write up. It's likely we'll solve this in a different way as we exapnd other parts of Vector. We'll keep this here for reference though.

vectordotdev / vector

Add `instruct` subcommand #2003

Proposal

Possible user interface