Closed toreanderson closed 3 years ago
Thank you for the detailed analysis. It does look like we should use getifaddrs(3) -- as we did in the SunOS port a while back. But what would you say is the best way to learn the ifIndex numbers? Perhaps we need to combine the two approaches. My inclination would be to run getifaddrs(3) afterwards, limited only to discovering additional IP addresses for existing interfaces, so they can be candidates for agentIP. That way the list of interfaces we consider for counter-polling etc. is still anchored to what appears in /proc/net/dev. Thoughts?
Allowing an arbitrary IP address to be typed into /etc/hsflowd.conf is not tempting. In a large network renumber of 1000s of switches it's easy to imagine a scenario where two switches end up sending with each other's IP (easy to imagine because it happened).
The struct ifaddrs
list returned by getifaddrs(3)
already contains the interface name in ->ifa_name
, so you can simply use if_nametoindex(3)
to get the ifIndex.
I wouldn't mind hard-coding the agent IP directly in hsflowd.conf
. I am doing that right now with agent.cidr = <ipv6-addr>/128
, and it works well. In any case, my hsflowd.conf
template just references the same variables that are used to configure the loopback addresses and BGP/OSPF router IDs, so conflicts shouldn't be an issue.
+1 on this, it is extremely common to use a loopback interface IP as the agent ip. Also I Dont think it is hsflowd's job to dictate what the agent IP should be, especially since the sflow RFC specifically states:
"The IP address associated with this agent. In the case of a
multi-homed agent, this should be the loopback address of the
agent. The sFlowAgent address must provide SNMP connectivity
to the agent. The address should be an invariant that does not
change as interfaces are reconfigured, enabled, disabled,
added or removed. A manager should be able to use the
sFlowAgentAddress as a unique key that will identify this
agent over extended periods of time so that a history can
be maintained."
These characteristics are in-fact , wholly met by using the loopback interface IP. We should allow users to hard code an agent IP and put the burden of making sure its the right one on the user imo.
I intend to add the getifaddrs(3) step, but in the mean time I confirmed that you can override using agentIP in /etc/hsflowd.conf. Like this:
agentIP = 192.0.2.10
This works for IPv6 addresses too.
So the order of precedence looks like this:
I would caution against using this, even with Puppet, because it's so hard to troubleshoot if it goes wrong. But maybe that's just the tortured madness of a traumatized mind.
Interesting! I was not aware of this possibility, as the agentIP
setting isn't mentioned in the default config file nor at http://sflow.net/host-sflow-linux-config.php. Might be worth adding?
(I updated the issue title, as the existence of agentIP
makes the original title incorrect.)
Thank you again for raising this issue. I believe it has been fixed with release v2.0.32. When automatic agent-address selection is used, it now considers all the addresses.
hsflowd
appears to use the ancient netdevice ioctls in order to obtain the list of candidate IPv4 agent IPs on the system:https://github.com/sflow/host-sflow/blob/0637908114af9c5189091154b983df969edec7dc/src/Linux/readInterfaces.c#L774-L775
This ioctl assumes that a single network device only has a single IPv4 address. That assumption has has been false for decades (since Linux v2.2 IIRC).
As a practical consequence this means that
hsflowd
is only able to select the first IPv4 address added to an interface as its agent IP.For example, consider the following network setup, extremely common on routers, where there is a second globally scoped IP address (
192.0.2.10
) added to the loopback interface. This address serves as the primary address of the router.Given this configuration it appears impossible
hsflowd
use192.0.2.10
as its agent IP.Setting
agent = lo
yields an agent IP of127.0.0.1
(not entirely unexpected, although one could reasonably have thought it would have given the globally scoped address a higher precedence than a host-scoped one by default).Setting
agent.cidr = 192.0.2.10/32
(oragent.cidr = 192.0.2.0/24
) unexpectedly yields an agent IP of10.20.211.88
. Presumably this happens because the CIDR filtering logic never gets to evaulate the192.0.2.10
address, therefore reverting to the default behaviour (i.e., behaving as ifagent.cidr
was not specified).To fix this I believe it is necessary to rewrite
readInterfaces()
to use a more current API to enumerate local IP addresses, such asgetifaddrs(3)
.Another approach would be to make it possible for the user to hard-code the agent IP in
hsflowd.conf
directly, thus disabling the auto-detection logic completely. As far as I can tell, the agent IP is just an identifier; it should not have to correspond to an address configured on a local network device.