sflow / host-sflow

host-sflow agent
http://sflow.net
Other
146 stars 55 forks source link

Fix to resolve AgentID reconfiguration issue. #49

Closed vidhya-rajan closed 1 year ago

vidhya-rajan commented 1 year ago

Description config sflow agent-id add Upon configuring a valid agent ID, the sample packet contains the IP address corresponding to the interface. Once the agentID is removed using config sflow agent-id del, agentID should fallback to the default IP (multicast IP )

Issue observed is the older agentID still persists in the sample packet and the previous agentID is not unconfigured.

Steps to reproduce the issue: Load 202205 community SONiC release build Enable sflow feature by using 'config feature state sflow enabled ' Ensure sflow docker is instantiated successfully before executing sflow rleated config commands. Enable sflow globally by using 'config sflow enable' Configure a valid agenID using the command config sflow agent-id add Remove the configured agent-id using config sflow agent-id del

Results Observed: Sample packet contains the previously configured agentID though the agentID id deleted

03:03:55.524563 IP 6.1.1.2.55682 > 6.1.1.1.6343: sFlowv5, IPv4 agent 3.3.3.1, agent-id 100000, length 1288 03:03:55.526175 IP 6.1.1.2.55682 > 6.1.1.1.6343: sFlowv5, IPv4 agent 3.3.3.1, agent-id 100000, length 448

Expected results: The sample packet should contain the multicast IP as agentID since the previously configured agentID is deleted.

03:04:01.535556 IP 6.1.1.2.56719 > 6.1.1.1.6343: sFlowv5, IPv4 agent 240.127.1.1, agent-id 100000, length 868

Root cause

  1. Hsflow daemon sends event notifications for every configuration change. The agentID event notification is sent only for addition of agentID and not removal

Solution

Hsflow daemon has to send agentID config notifications for removal.

sflow commented 1 year ago

Thanks for this. However I think this behavior will be fixed when we improve how new config is handled elsewhere. Leaving the agent device override out should make it revert to the default and it currently does not. I am looking at this as a priority. Hope to check in a fix some time over the next few days. Let me know if there is a deadline I should know about.

vidhya-rajan commented 1 year ago

Thanks. When the appropriate events are sent out upon reconfiguration, as stated by you the agent device overrides and switches to default. Do you think the current change is not sufficient to meet this functionality ?Please let us know

sflow commented 1 year ago

I checked in a change and tagged release v2.0.41-1. Now when mod_sonic leaves out the agent-id override it reverts to the default correctly. This change is not the same as this pull request but it has the same effect (and it will work for other modules that can make dynamic config changes too).

Note that if you upgrade to v2.0.41-1 you will notice that the agent now tries much harder to avoid picking a multicast address as the agent address. It will even prefer an IPv6 address such as FD::1, but most likely it will pick the address of eth0. That is usually the right choice, so in most cases users will not have to configure the agent-id at all.

vidhya-rajan commented 1 year ago

Thanks. Should we upgrade to v2.0.41-1 to have this fix ? Is it possible to share the fix patch alone since we are using older HSFLOW versions 2.0.34.
We are approaching towards our end of integration and upgrade will not be feasible at this point of time. Can you please share this patch fix so that we can use the same ?

sflow commented 1 year ago

If you browse the commit on github you'll see that it involved changes in multiple places. I have no experience or expertise on the tradeoffs that you are looking at, but I would point out that at least two of the other changes since 2.0.34 were bugfixes for SONiC that were probably more important that this one. So if you are so close to your deadline that you can't try v2.0.41-1 then maybe you shouldn't change anything at all? Or you could use the simpler patch that you submitted for this pull request?

vidhya-rajan commented 1 year ago

Thank you for your inputs. Sure, Then now I will try upgrading to v2.0.41-1 and check if its feasible else I will take the simpler patch shared here.

vidhya-rajan commented 1 year ago

We tested with the latest version v2.0.41-1 . The current behavior when agentIP is removed is

  1. The agentIP is not set to multicast IP
  2. Agent IP is now set to the IP address of the interface through with collector IP is reachable
sflow commented 1 year ago

That sounds correct to me.

FYI, version 2.0.41-1 compiled with FEATURES=SONIC will also compile hsflowd dropmon module if the Linux kernel is 5.4 or later, but will not enable it automatically. That way it is available to be enabled via /etc/hsflowd.conf if it is required and supported by the ASIC, but will otherwise have no effect. If you see any compilation problem please let me know and include what you see for "uname -a" on the build system.

vidhya-rajan commented 1 year ago

Thanks. You are right , we are facing compilation issues in SONIC due to the kernel version where build system kernel version is greater than 5.4 however docker sflow uses buster .

https://github.com/sonic-net/sonic-buildimage/issues/13252 SONIC has already a patch to address this so we are going ahead with the same patch changes . 0001-dropmon-workaround-created-local-copy-of-linux-net_dropmon.patch

This patch was modified 0001-sflow-enabled-drop-monitor-support-for-SONiC.patch according to the latest version

sflow commented 1 year ago

We checked in changes to address both problems. Now mod_sonic can supply it's own ifIndex numbers to be used as tiebreakers in the automatic agent-address selection, and mod_dropmon can be compiled successfully on older Linux distros. See version 2.0.42-1. Please confirm this works for you.

vidhya-rajan commented 1 year ago

Thanks. 2.0.42-1 has both the fixes

vidhya-rajan commented 1 year ago

@sflow Please find the below behavior with respect to agentIP for IPv4 and IPv6 sampling. Can you please confirm if this is in line to the expectations. IPv4 behavior AgentIP configuration set to default

  1. If IPv4 address is configured only on collector interface , agentIP is IP address through which collector is reachable.
  2. When multiple interfaces have IP configured, agent-id changes to the most recent IP address configured on any interface

IPv6 behavior

1) IPv4 address is not configured on any interface, whereas IPv6 addresses are configured on all interfaces. With default setting the agent ID is the IP address of the management interface

sflow commented 1 year ago

If no agent address or interface is specified in the config then hsflowd will choose one. It is preferable that the IP address chosen should be unique persistent and reachable, so candidates are scored based on how likely they are to have those properties. The "election" is stable in the sense that if hsflowd is restarted with the same settings and the same candidate addresses then it should always pick the same agent address. If two or more leading candidates are the same in every respect then the algorithm sometimes has to fall back on ifIndex numbers as the final tiebreaker. In the most recent change we tried to further stabilize that by considering the SONiC ifIndex numbers ahead of the underlying Linux ifIndex numbers (since the latter can change during a warm boot). The election does not take the previous selection into account, however, so if a new address has been added or the SONiC ifIndex numbers have changed then the outcome may be different. If you have a case where the election does not seem to be stable, please send the debug output you see from "hsflowd -ddd" in both cases. It should include logging related to the election.

vidhya-rajan commented 1 year ago

@sflow , Thanks We tested the deconfiguration scenarios where auto selection and election process occurs. The behavior is consistent across reboot where the elected interface continues to be the agent IP. The recently configured IP interface appears to be elected at all cases when the agent ID is deconfigured.

sflow commented 1 year ago

It sounds like you are saying the current behavior is acceptable? I don't want to close this issue if there is still a problem, so please let me know either way.

vidhya-rajan commented 1 year ago

@sflow we are not clear on the criteria based on which an interface is elected as agent ? Once the agent is chosen the same agent IP persists across reboot and this is working fine. But the grey area is how do we elect an interface as agent. The latest IP configured interface is not always elected for agent

sflow commented 1 year ago

The SONiC CLI allows for the interface to be chosen explicitly using "config sflow agent-id ...", and this is how it is done in almost 100% of deployment scenarios because there is usually a particular IP that represents the switch from the monitoring/management point of view (e.g. the address on eth0). So the purpose of the election is only to chose sensibly in the event that the operator missed that step. The sflow-agent-address is the unique identifier that the monitoring system will use to refer to that device, so as long as the algorithm chooses an address that is unique to the switch, and chooses the same one after reboot, then it's job is done. If the operator doesn't approve then the solution is to go back and use "config sflow agent-id ...".

If the election seems over-elaborate then remember that hsflowd config is also for servers, and the interfaces on servers running different Linux distros can vary widely. It helps when there is just one hsflowd config for the whole system. That way it can be served from DNS-SD, or typed once into Puppet or Kubernetes. If you look at the source code you will see that the election allows for other ways to put a thumb on the scale (e.g. by upvoting or downvoting a CIDR) but exposing those in the SONiC CLI is not necessary. For a SONiC switch, "config sflow agent-id ..." is enough.

So thank you for your help in stabilizing the tie-breaker on SONiC. I'll close this issue now but please reopen or submit another if I missed something.