inex / IXP-Manager

Full stack web application powering peering at over 200 Internet Exchange Points (IXPs) globally.
https://www.ixpmanager.org/
GNU General Public License v2.0
377 stars 161 forks source link

Sflow support for mixed e.g. Brocade and Force10 flows #94

Closed bcix closed 10 years ago

bcix commented 11 years ago

Hey there,

after digging more into the collected flow data from the IXP-Manager I was wondering that I have a lot of P2P graphs where I have either no IN or no OUT data. This is the case when Peer A is on Brocade and Peer B is on Force10 connected (or the other way round. Everything in the rrd graphs looks fine if for Peer C and Peer D, both connected to Brocade or both connected to Force10. Background: At BCIX we run Brocade and Force10 Switches.

At first I looked into the received samples by sflowtool, which is wrapped by the sflow-to-rrd-handler of the IXP Manager distribution.

As sflow-to-rrd-handler receives the sflow data via /usr/local/bin/sflowtool -4 -p 6343 -l I compared the flows from both platforms.

The flows from Brocade are looking pretty much the same than from Force10: Here an example for a Brocade and a Force10 flow and the variables from the wrapper script in three lines:

FLOW,<brocade-ip>, 452,258,              5c5eab32aaaa,6c9ced70d5aa,0x0800,100,    100, 173.194.xxx.xxx,130.149.xxx.xxx,6, 0x00,   61,   80   ,   64388,   0x18,       833,        815,       8192
FLOW,<force10-ip>, 44893186,34669570,    d867d95b58aa,0015c72273aa,0x0800,100,    0,    95.91.xxx.xxx,   23.63.xxx.xxx ,6,0x00,   60,   53095,   80,      0x10,       74,         52,        8192
undef, $agent,   $srcswport, $dstswport, $srcmac,     $dstmac, $ethertype, $vlan, undef, $srcip,        $dstip, $protocol, $tos, $ttl,$srcport, $dstport, $tcpflags, $pktsize, $payloadsize, $samplerate) 

What you can see that $srcswport, $dstswport are quite high numbers on Force10, the rest looks pretty much the same.

Is FLOW data here correct to deliver the correct RRD files, or might I have to look somewhere else for the potential bug?

Thanks, Thorleif

nickhilliard commented 10 years ago

@bcix $srcswport and $dstswport aren't used in the code. I left those variables in there just for documentation purposes.

The only variables used for tracking switch egress/ingress ports are $srcmac, $dstmac and $vlan. In the example above, these are:

5c5eab32aaaa,6c9ced70d5aa,100
d867d95b58aa,0015c72273aa,100
$srcmac, $dstmac, $vlan

So if the collector is correctly registering traffic going between the same type of switch, but ignoring traffic going between different switch vendors, then that's really weird.

Can you run the sflow controller as sflow-to-rrd-handler --debug? This will print out a pile of garbage to start with and every time it flushes the data to , but other than that, it will only print out unknown sflowtool output entries (prefixed with DEBUG: rejected:, and followed by the sflow line. All these entries are ignored. It's normal to see a small number of these. If you're seeing lots, then it means there's a consistency problem between the sflow collector and the back-end l2 database table, which is probably the root cause of this problem.

bcix commented 10 years ago

Ok, checked the debug modus, but no single flow on the BCIX peering vlan 100 was dropped, only a few flows on some private vlans.

bcix commented 10 years ago

To correct my issue a bit:

Traffic is displayed in the mixed Force10/Brocade setting like this:

Brocade to Force10: data in the P2P graphs seems to be ok Force10 to Brocade: data in the P2P graphs is 0.00 ( Max Avg Cur are all 0.00)

nickhilliard commented 10 years ago

If there's nothing dropped from peering vlan 100, then that means sflow-to-rrd-handler is happy. Can you confirm that you're seeing the sflow data? Find out the srcmac and dstmac addresses used and then run sflowtool -l | egrep '(srcmac|dstmac)' and see if it's actually exporting the data (obviously replace srcmac|dstmac with the correct mac addresses).

bcix commented 10 years ago

This might be interesting: I found one graph in the mixed Force10/Brocade setting where the Brocade to Force10: data in the P2P graph is much higher as the whole port traffic in the mrtg graph

nickhilliard commented 10 years ago

definitely interesting. that indicates that the mac->port mapping is messed up. Can you email me offline the output of update-l2database.pl --debug, the URL of the image which includes too much traffic, a URL of an image which contains no traffic, and also the output of sflow-to-rrd-handler --debug. This is a lot of data. Maybe you could zip it up first?

bcix commented 10 years ago

ok, sflowtool -l | egrep '(srcmac|dstmac)' tested for some peers shows all sflow data is sent to the ixpmanager sflow-to-rrd-handler sflowtool.

bcix commented 10 years ago

sent all the request data via mail

nickhilliard commented 10 years ago

@bcix and I have spent some time looking at this problem. Turns out that the Force 10 boxes are reversing the sflow direction. I.e. they've mixed up src and dst mac addresses. This means that traffic between F10 and brocade ports is zero in one direction and the aggregate of in + out in the other. It also means that F10 to F10 traffic is reversed.

This is obviously an F10 bug. @bcix is going to revert to them to see if they can fix it.

bcix commented 10 years ago

It's Dell Force10 Case 00984693 ,now...

barryo commented 10 years ago

Closing as this is not an IXP Manager bug and it's been idle for 3 months. @bcix - any update from F10? If they won't do anything about it, please reopen as we may have to work around it.

nickhilliard commented 10 years ago

just as a last comment on this, IXP manager could work around the situation where there are either all ingress sflow devices on the network, or else all egress sflow devices. It's not possible to deal with a mix of the two because this doesn't define a functional accounting perimeter.

barryo commented 10 years ago

last comment? I admire your optimism :smiley:

bcix commented 10 years ago

last comment: Force10 case ist still open. Dell Force10 has the RFE to implement ingress traffic sflow samples in addition to egress traffic sflow samples on their roadmap for one of the next FTOS versions for the S4810 switch, but still no release date, yet :-(

bcix commented 10 years ago

Update: Ingress SFlow is scheduled for Dell FTOS 9.7.0.0 to be released around Q1/2015...