raman325 / ostinato

Automatically exported from code.google.com/p/ostinato
GNU General Public License v3.0
0 stars 0 forks source link

not all ports show statistics #80

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I have a box with 5 ethernet ports (1 single port PCI card, 2 ports on mobo, 2 
ports on PCIe card.

After all interfaces are up (ifconfig ethX up), eth4 status is shown as unknown 
(screenshot - 1st window).

Furthermore, when creating and enabling VLAN interfaces some of them show the 
same behavior.

In the attached example, I added VLAN interfaces 100,200,300 for port eth1 
(vconfig add eth1 <VLAN>, ifconfig eth1.<VLAN> up), eth1.200 is shown with 
status unknown (screenshot - 2nd window). 
If I now delete all the VLAN interfaces and add them 100,300,200 this time 
eth1.300 shows this behavior, whilst eth1.200 behaves as expected (screenshot - 
3rd window). It looks like there is a problem with certain port indexes.

Ports with state "unknown" work if it comes to send out streams, but don't 
shown any statistics (neither for rx, nor tx traffic).

Ostinato Version is 0.5.1 Revision: 74c9dcf830e3@
Kubuntu Linux 12.04 x86-64 bit

Original issue reported on code.google.com by loox...@googlemail.com on 1 Nov 2012 at 10:19

Attachments:

GoogleCodeExporter commented 9 years ago
Run drone directly and attach the console log here. Also provide output of the 
following commands -

ifconfig ethX (for all 5 ports)
cat /proc/net/dev

Original comment by pstav...@gmail.com on 2 Nov 2012 at 3:45

GoogleCodeExporter commented 9 years ago
sorry, was busy....attached you will find the requested output for case 1 (eth4 
affected)

Original comment by loox...@googlemail.com on 26 Nov 2012 at 1:01

Attachments:

GoogleCodeExporter commented 9 years ago
ok...I just did some quick "debugging" without actually understanding the 
mechanics, but I think I know where the problem is:

LinuxPort::StatsMonitor::netlinkStats()

The buffer size for the parsed netlink messages is determined by peeking into 
the first message. over here, that message contains information for the 
interfaces lo,eth0,eth3, resulting in a buffersize of 3020 byte (996 + 2*1012). 
Inside the _retry loop, the message for these interfaces is retrieved and 
parsed, the while(NLMSG_OK(nlm, (uint)len)) is finished since len is 0.  Next 
call from recvmsg returns information for interfaces eth1,eth2,eth4. However, 
this message is BIGGER than the first one (3036 = 3*1012), the buffer not big 
enough, len too small, thus after parsing eth1 and eth2, the last record for 
eth4 has a size of 1012 byte, but len is only 996 byte, thus the NLMSG_OK 
condition in the while loop is false, eth4 is not processed.

so it looks like the problem occurs due to the fact that the netlink messages 
for both group of ports differs in size, using the smaller one as metric.

I hope it got at least a little bit clear what I meant, forgive me, daring to 
debug your code without actually understanding it (: just want to help (:

Original comment by loox...@googlemail.com on 26 Nov 2012 at 5:19

GoogleCodeExporter commented 9 years ago
a solution might be to move:

    count = 0;
_retry:

right above:
// Find required size of buffer and resize accordingly
    while (1)

so that for every new netlink msg the buffer size gets adjusted according to 
the peek into the message, but thats just an idea

Original comment by loox...@googlemail.com on 27 Nov 2012 at 10:45

GoogleCodeExporter commented 9 years ago
@looxrat: I guess you hit the nail on the head. Thanks for debugging. For a 
multipart netlink message, peeking won't help because you can't peek more than 
the first message. So will possibly have to send and receive twice - once to 
get the buffer size and subsequently to create the port list with the actual 
data.

Till those changes are made, I recommend the following quick hack -

In LinuxPort::StatsMonitor::netlinkStats(), change the following default buffer 
size from 1024 to 8192 or 16384 -

    buf.fill('\0', 1024);

Let me know if that fixes the issue for now.

Original comment by pstav...@gmail.com on 27 Nov 2012 at 3:54

GoogleCodeExporter commented 9 years ago
yes, increasing the initial buffer size works perfectly, even with 19 interface 
(the 5 physical ones, the loopback and a bunch of vlans).

Regardin the peeking I have to admit I dont quite understand it yet, forgive me 
(: I take it, the moment you do the recvmsg() inside the _retry loop the 
message is removed, thus, if the wile(1) loop (which is currently outside the 
_retry loop) gets executed for every message first (put into the _retry loop), 
wouldn't that always determine the size of the next message to be recv?

just curious! Anyway, good work, really nice program (:

Original comment by loox...@googlemail.com on 28 Nov 2012 at 8:23

GoogleCodeExporter commented 9 years ago
Issue 94 has been merged into this issue.

Original comment by pstav...@gmail.com on 13 Jan 2013 at 5:11

GoogleCodeExporter commented 9 years ago
@looxrat: on revisiting the code, I see that your suggested fix about moving 
the _retry loop will work. Will fix shortly.

Original comment by pstav...@gmail.com on 13 Jan 2013 at 5:13

GoogleCodeExporter commented 9 years ago
revision a95d85838d53 fixes this issue

Original comment by pstav...@gmail.com on 16 Jan 2013 at 4:36

GoogleCodeExporter commented 9 years ago
I had same issue ver 0.5.1 on ubuntu. 2 out of 8 ports shown "unknown" and 
cannot pass traffic.

Previous discussion here said quick fix is to increase buffer size...
In LinuxPort::StatsMonitor::netlinkStats(), change the following default buffer 
size from 1024 to 8192 or 16384 -

    buf.fill('\0', 1024);

Can someone tell me where I can change the buf.fill setting? which file? Or I 
have to wait for a newer Ostinato version?

Thanks.

Original comment by weylwa...@gmail.com on 30 Sep 2013 at 6:55

GoogleCodeExporter commented 9 years ago
@weylwang7: The file is server/linuxport.cpp

The buf.fill change was only a workaround. The actual fix is already committed. 
You can get the latest code from the repository and build. See the Wiki for 
more details on how to do that

Original comment by pstav...@gmail.com on 2 Oct 2013 at 3:24