Closed ghost closed 4 years ago
This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do? Another example is Kubernetes configuration management system. I could make the case that services like AWS' SSM Parameter Store might be better for some users as a central config store, but I wouldn't want to build in direct support for SSM in Vector, because the number of integrations there are endless.
This is interesting, this certainly is not my area of expertise, but isn't this what systems like Confd are designed to do?
I see, these systems make it possible to distribute the configs automatically, and with Vector's live reloading feature this doesn't even require to restart Vector on config updates.
Actually, I came up with this idea as a first step to tackle the problem of scaling Vector horizontally when it acts in the role of the central service. Currently, if a single Vector instance doesn't have enough throughput to be in the role of the central service, there are three options:
So after adding the instructor/pupil mode, it could have been possible to build support for service discovery right into Vector source/sink protocol. Thus, it could have worked for example like this:
As I understand, this can be done with an external tool like linkerd
. But, like in the case of the built-in centralized config management, I wonder does it make sense for Vector to gradually embrace these kinds of common functionality to have less assumptions about the other components of the infrastructure (so if one learns how to scale Vector on Kubernetes, the same knowledge can be applied without much modification on AWS and vice versa).
I worry this unnecessarily expands Vector into a distributed system. Notably it subverts the role of some nodes to give them a control plane role.
Perhaps this is a task a different project should accomplish?
Just noting this kind of change probably requires an RFC as its a pretty significant feature.
Thanks @a-rodin this was a great write up. It's likely we'll solve this in a different way as we exapnd other parts of Vector. We'll keep this here for reference though.
In cases when there are many instances of Vector deployed, it might become more and more difficult to quickly update the configuration for all of these instances.
Presently, there are three main solutions to this:
However, all of them either are not real-time, so that configuration changes propagate slowly (solutions 1 and 2), or are somewhat hard to setup (solution 3).
Proposal
So I propose to add two new modes to Vector, instructor and pupil, which would make it possible to configure many Vector instances using a single config file with instant propagation of the changes. These modes could interact in the following manner:
If live reloading of the config file is enabled for the instructor and the config changes or if it is restarted, the config is translated to the pupils, so that they are always in sync with the single config file provided to the instructor. Vector in instructor mode doesn't process the data itself.
Possible user interface
vector --config ...
command. However, the argument provided to the--config
option would be a URI of formvector://instructor-host
.vector://instructor-1:1234,instructor-2:3456,instructor-3:7890
.instructor
subcommand together with usual commands for specifying config files, for example:vector --config vector.toml instruct
.