sleinen / samplicator

Send copies of (UDP) datagrams to multiple receivers, with optional sampling and spoofing
GNU General Public License v2.0
389 stars 132 forks source link

Allow receiver hostname definition #60

Open quielb opened 5 years ago

quielb commented 5 years ago

Please add the ability to define the receiver by hostname and not just IP address. This would allow to forward to a cluster hostname using RR DNS or a service discovery hostname. This would increase the availability of the replicated packet.

Using just a VIP would require something like a hardware load balancer or managing an IP address with something like corosync. Being able to use modern techniques like service discover allows for an active-active load balance (although a harware LB ie. F5 would do this, I don't want to pay for one of those), which adds a scale out feature.

sleinen commented 5 years ago

Thanks for your suggestion! I see how this makes sense. The problem is that we'd need to think about how often to resolve the hostname to an IP address. Doing this for every packet would be prohibitive, because many people use the samplicator at high packet rates, and they don't want to flood their nameservers (or even name caches) at these high rates.

DNS has TTLs, which could be used to set the frequency of name resolutions to something more manageable—although the case of TTLs being zero or close to zero would have to be addressed.

While this is all worthwhile, it is a bit hard to implement correctly: Firstly, you cannot just use getaddrinfo() (or gethostbyname()), because you need the TTL information. Secondly, you need to implement some sort of "scheduling" of the name (re-) lookups, and the current event loop is very dumb and would have to be refined (or replaced).

quielb commented 5 years ago

Agreed that DNS lookup would add some overhead on a lot of systems. But most linux distributions have NSCD (Name services caching daemon) configured by default or at least available. This provides a local cache on box in memory that mitigates some of the issue you point out. There isn't any mention of using the DNS TTL in the config. It looks like it has it's own configurable TTL for each service type it caches, so controllable by the admin. And as far as I know the gethostbyname() library function does automatically support this. With NSCD and gethostbyname() I don't think you would have to solve the DNS lookup and cache timing issues. You could simply let the system handle it. There is also a direct interface to NSCD via shared memory (according to the MAN(5) page) that may make things fast enough in single samplicator implementations.

And this is probably unique to my use case, but we are using consul as service discovery. There is a consul agent running on every box. And that consul agent provides a DNS lookup of services. Since the DNS lookup is contained to just the samplicator host for the receiver address there isn't load on the enterprise DNS servers. I just don't know what the max lookup rate of consul is.

Also another thing to consider is scale. In my case I am looking to resolve a receiver by name to leverage service discovery. I also have plans to to scale samplicator the same way. On my device I would set the "syslog" host to a service discovery hostname allowing me to scale samplicator horizontally and find an active samplicator node. Doing this would most likely mitigate the increase load of the DNS lookups on an individual host. I can simply add more samplicator hosts to handle higher packet rates.

For me personally I would be willing to scale samplicator horizontally, and other necessary systems (DNS) to overcome any additional load created. For me the availability is probably more important that the cost in resources.