Refactor `socket` polymorphic configuration for mode

vectordotdev / vector

A high-performance observability data pipeline.

https://vector.dev

Mozilla Public License 2.0

18.25k stars 1.6k forks source link

Refactor `socket` polymorphic configuration for mode #19012

Open jszwedko opened 1 year ago

jszwedko commented 1 year ago

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Configuring the socket source is a bit error prone since some options only apply depending on the mode. For example:

address: 0.0.0.0:514
mode: unix

is invalid, but this is only discovered at runtime. Instead, we could take a page from Rust and make "impossible states impossible" by collapsing the options into a single address option.

Attempted Solutions

No response

Proposal

The socket source, rather than having separate mode, address, and path configuration options which can only be applied conditionally (e.g. path can only be set if mode: unix), just have the address field but have it accept a scheme. That is:

mode: udp
address: 0.0.0.0:514

would go to

address: udp://0.0.0.0:514

mode: tcp
address: 0.0.0.0:514

would go to

address: tcp://0.0.0.0:514

mode: unix
path: "/var/run/some.sock"

would go to

address: unix:///var/run/some.sock

References

https://github.com/vectordotdev/vector/issues/17050

Version

vector v0.33.1

hhromic commented 1 year ago

Not sure about this feature :)

Consider the following aspects:

Having a protocol://address syntax does not really make "impossible states impossible". For instance: tcp:///path/to/socket or unix://0.0.0.0:8080 are two invalid examples that still will only be discovered at runtime.
As you mention, there are options (today) that depend on mode besides path. For example: connection_limit (TCP), keepalive.* (TCP), max_connection_duration_secs (TCP), max_length (UDP), socket_file_mode (unix), etc.
- Without a mode option anymore, these would be instead turned into parameters of the URI option?
- E.g: protocol://address?param1=val1&param2=val2?
- If not, without a mode option these mode-dependant options would turn confusing imho.
- Not to mention that parsing and validating will become more complex too.
- The complete URI syntax shown above would be easier to parse (standard URI parsing).
- However, the options would stop being first-class configurations in the code.

All in all, I think turning address into an URI would bring little value imho. At least for us, misconfiguration discovered at runtime has never been such a problem, and as mentioned an URI would not change that anyway.

We have never been confused by the way it is today, which is pretty concise and clear already tbh. Those are my 50c :)

jszwedko commented 1 year ago

Not sure about this feature :)

Consider the following aspects:

Having a protocol://address syntax does not really make "impossible states impossible". For instance: tcp:///path/to/socket or unix://0.0.0.0:8080 are two invalid examples that still will only be discovered at runtime.

As you mention, there are options (today) that depend on mode besides path. For example: connection_limit (TCP), keepalive.* (TCP), max_connection_duration_secs (TCP), max_length (UDP), socket_file_mode (unix), etc.

Without a mode option anymore, these would be instead turned into parameters of the URI option?

E.g: protocol://address?param1=val1&param2=val2?

If not, without a mode option these mode-dependant options would turn confusing imho.

Not to mention that parsing and validating will become more complex too.

The complete URI syntax shown above would be easier to parse (standard URI parsing).

However, the options would stop being first-class configurations in the code.

All in all, I think turning address into an URI would bring little value imho. At least for us, misconfiguration discovered at runtime has never been such a problem, and as mentioned an URI would not change that anyway.

We have never been confused by the way it is today, which is pretty concise and clear already tbh. Those are my 50c :)

Good thoughts! We could encode as URL parameters, but an alternative would be go the "polymorphic" route described by https://github.com/vectordotdev/vector/blob/master/docs/specs/configuration.md#polymorphism. For example, users would instead have configs like:

mode: unix
unix.path: "/var/run/foo.sock"
unix.file_mode: "0777"

mode: tcp
tcp.connection_limit: 100

hhromic commented 1 year ago

Ah! That does look much nicer indeed, namespaced by the mode basically. Yes, I personally like that very much as in the case of buffer.*, batch.*, etc options. And also would make it nice when translating env variables, which we do a lot with templating.

jonaslb commented 5 months ago

I just realised this is specifically for the socket source, but the referred issue (#17050) asks for generic support for this feature. My question is if this solution for the socket source is intended as a "pilot" for something generic, or if it wouldn't be worth it to try and aim for a generic solution immediately and then apply it here?

My interest that led me here is in using a unix socket with the vector source, to enable implementing custom sources as sidecars (similar to exec, but with orchestration handled from the outside and support for possibly more things - hoping I find acknowledgements in there). And I didn't find a way to do it, if it is at all possible right now. Of course a workaround for me is to bind to loopback. Edit: This is still early Vector for me, and I'm not currently sure there is advantage in using the vector source/sink api over the plain socket. It is not the point of this comment to start a discussion around that :)

jszwedko commented 5 months ago

Thanks for the thoughts!

I think both this issue and https://github.com/vectordotdev/vector/issues/17050 are valid. This issue is about refactoring the config for sockets which bind to addresses (i.e. act as servers) to be polymorphic rather than a sort of "tagged union". I think it could extend to the vector source to allow binding to a unix socket.

https://github.com/vectordotdev/vector/issues/17050 is about supporting unix:// as a scheme for components that act as clients such that it would enable any client that has a configurable "endpoint" to connect over a unix socket (including the vector sink).

For now, I think connecting over loopback is probably the best option for vector source/sink communication on the same host.