sflow / host-sflow

host-sflow agent
http://sflow.net
Other
146 stars 55 forks source link

[linux] agent.cidr ignores secondary IPv4 addresses #40

Closed toreanderson closed 3 years ago

toreanderson commented 3 years ago

hsflowd appears to use the ancient netdevice ioctls in order to obtain the list of candidate IPv4 agent IPs on the system:

https://github.com/sflow/host-sflow/blob/0637908114af9c5189091154b983df969edec7dc/src/Linux/readInterfaces.c#L774-L775

This ioctl assumes that a single network device only has a single IPv4 address. That assumption has has been false for decades (since Linux v2.2 IIRC).

As a practical consequence this means that hsflowd is only able to select the first IPv4 address added to an interface as its agent IP.

For example, consider the following network setup, extremely common on routers, where there is a second globally scoped IP address (192.0.2.10) added to the loopback interface. This address serves as the primary address of the router.

rlnc-user@hsflowd:~/host-sflow$ ip -4 address list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet 192.0.2.10/32 scope global lo
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc fq_codel state UP group default qlen 1000
    inet 10.20.211.88/26 brd 10.20.211.127 scope global dynamic ens3
       valid_lft 84048sec preferred_lft 84048sec

Given this configuration it appears impossible hsflowd use 192.0.2.10 as its agent IP.

Setting agent = lo yields an agent IP of 127.0.0.1 (not entirely unexpected, although one could reasonably have thought it would have given the globally scoped address a higher precedence than a host-scoped one by default).

Setting agent.cidr = 192.0.2.10/32 (or agent.cidr = 192.0.2.0/24) unexpectedly yields an agent IP of 10.20.211.88. Presumably this happens because the CIDR filtering logic never gets to evaulate the 192.0.2.10 address, therefore reverting to the default behaviour (i.e., behaving as if agent.cidr was not specified).

To fix this I believe it is necessary to rewrite readInterfaces() to use a more current API to enumerate local IP addresses, such as getifaddrs(3).

Another approach would be to make it possible for the user to hard-code the agent IP in hsflowd.conf directly, thus disabling the auto-detection logic completely. As far as I can tell, the agent IP is just an identifier; it should not have to correspond to an address configured on a local network device.

sflow commented 3 years ago

Thank you for the detailed analysis. It does look like we should use getifaddrs(3) -- as we did in the SunOS port a while back. But what would you say is the best way to learn the ifIndex numbers? Perhaps we need to combine the two approaches. My inclination would be to run getifaddrs(3) afterwards, limited only to discovering additional IP addresses for existing interfaces, so they can be candidates for agentIP. That way the list of interfaces we consider for counter-polling etc. is still anchored to what appears in /proc/net/dev. Thoughts?

Allowing an arbitrary IP address to be typed into /etc/hsflowd.conf is not tempting. In a large network renumber of 1000s of switches it's easy to imagine a scenario where two switches end up sending with each other's IP (easy to imagine because it happened).

toreanderson commented 3 years ago

The struct ifaddrs list returned by getifaddrs(3) already contains the interface name in ->ifa_name, so you can simply use if_nametoindex(3) to get the ifIndex.

I wouldn't mind hard-coding the agent IP directly in hsflowd.conf. I am doing that right now with agent.cidr = <ipv6-addr>/128, and it works well. In any case, my hsflowd.conf template just references the same variables that are used to configure the loopback addresses and BGP/OSPF router IDs, so conflicts shouldn't be an issue.

mayuresh82 commented 3 years ago

+1 on this, it is extremely common to use a loopback interface IP as the agent ip. Also I Dont think it is hsflowd's job to dictate what the agent IP should be, especially since the sflow RFC specifically states:

"The IP address associated with this agent.  In the case of a
        multi-homed agent, this should be the loopback address of the
        agent.  The sFlowAgent address must provide SNMP connectivity
        to the agent.  The address should be an invariant that does not
        change as interfaces are reconfigured, enabled, disabled,
        added or removed.  A manager should be able to use the
        sFlowAgentAddress as a unique key that will identify this
        agent over extended periods of time so that a history can
        be maintained."

These characteristics are in-fact , wholly met by using the loopback interface IP. We should allow users to hard code an agent IP and put the burden of making sure its the right one on the user imo.

sflow commented 3 years ago

I intend to add the getifaddrs(3) step, but in the mean time I confirmed that you can override using agentIP in /etc/hsflowd.conf. Like this:

agentIP = 192.0.2.10

This works for IPv6 addresses too.

So the order of precedence looks like this:

  1. agentIP=address => use address
  2. agent=interface => use first IP from interface
  3. agent.cidr=CIDR => use discovered address found in CIDR
  4. default => choose from discovered addresses according to priority defined by EnumIPSelectionPriority (here: https://github.com/sflow/host-sflow/blob/v2.0.30-1/src/Linux/hsflowconfig.c#L977-L1052)

I would caution against using this, even with Puppet, because it's so hard to troubleshoot if it goes wrong. But maybe that's just the tortured madness of a traumatized mind.

toreanderson commented 3 years ago

Interesting! I was not aware of this possibility, as the agentIP setting isn't mentioned in the default config file nor at http://sflow.net/host-sflow-linux-config.php. Might be worth adding?

https://github.com/sflow/host-sflow/blob/0637908114af9c5189091154b983df969edec7dc/src/Linux/scripts/hsflowd.conf#L5-L11

(I updated the issue title, as the existence of agentIP makes the original title incorrect.)

sflow commented 3 years ago

Thank you again for raising this issue. I believe it has been fixed with release v2.0.32. When automatic agent-address selection is used, it now considers all the addresses.