vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.12k stars 1.6k forks source link

No sink appears to allow binding to a specific IP, e.g. for a multihomed host / host with multiple IPs #20190

Open james-stevens opened 7 months ago

james-stevens commented 7 months ago

A note for the community

Use Cases

Where a host is running vector and has multiple IP Addresses (e.g. multihomed) it would be useful to be able to bind a sink to a specific IP Address, for any sink that makes an outbound client connection. For example, when needing to set up a firewall at the remote end.

This is such a common feature in client IP software, I feel I must have missed something somewhere & vector can do this, but I've looked quite hard.

Attempted Solutions

I don't see anyway to solve this in vector without a new option, assuming I didn't miss something!

Usually, by careful configuration at the host level, you can predict the IP the operating system will assign for the outbound connection, but this is less than ideal and the vast majority of other outbound connection software supports binding to a specific IP, e.g. curl --interface zz.zz.zz.zz http://example.com/

You could get vector to connect to a local running transparent proxy, which does have IP binding as an option, but this is shockingly messy.

Proposal

All sinks would need an option adding to allow the user to specify which IP to bind to, e.g. interface: or bind_address:.

In my case I'm using the vector sink, but it really applies to all sinks that make outbound (client) connections.

References

No response

Version

0.36.0

jszwedko commented 7 months ago

This is interesting, thanks @james-stevens ! I was unaware of other software support for selecting the interface to use (like curl --interface). I was more used to that being decided by the routing table. I think this would be a reasonable feature to add to Vector though I'm not sure if it would be something the core team would get to for a while so we'd be happy to see a PR from someone adding support. It does look like the HTTP client crate we use, hyper, has support for it: https://github.com/hyperium/hyper/issues/602

james-stevens commented 7 months ago

I think, to be fair, it's possibly a feature that is more useful when running s/w directly in the O/S and obv everybody (except us) use containers these days!!

But yeah - its pretty common, like dig -b <addr> and ping -I <addr>, iptunnel option "ip-tunnel-source-address", ISC's bind "transfer-source" option etc - plus, of course server stuff like haproxy, apache and nginx obv let you bind the server IP. It's all useful stuff when setting a more restrictive firewall policy, if you can force the IP at both ends.

I was more used to that being decided by the routing table.

Yes, but we have multiple public IPs on the public interface - Unless locked, using bind(), by default Linux should choose to use the first one, but it would be better to lock it to a specific IP and (not being able to) means we can't tell it to use a different IP than the default.

I used to be a pure dev, but for some yrs I've been hybrid dev & syseng - so maybe I've just had to use it more often?

jszwedko commented 7 months ago

Makes sense! And agreed, it does seem like it'd be a more common need when deploying software directly on hosts (as opposed to containers).

johnhtodd commented 7 months ago

FYI: The new-ish code for TCP-based DNSTAP source has the ability to tie to a particular IP address.