richb-hanover / ndt

Automatically exported from code.google.com/p/ndt
Other
1 stars 0 forks source link

Packets out of order percentage incorrect in C client - 3.6.5.2-rc4 #89

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I found this when testing against the measurement lab server Stephen setup, 
thanks for setting this up.

Might just be out by a factor of 100?

Comparing against the Java applet it seems this line isn't being printed.

ndt-3.6.5.2/src$ ./web100clt -ll -n ndt.iupui.mlab1.nuq0t.measurement-lab.org

...

    ------  Web100 Detailed Analysis  ------

Web100 reports the Round trip time = 158.70 msec;the Packet size = 1448 Bytes; 
and 
There were 7 packets retransmitted, 517 duplicate acks received, and 519 SACK 
blocks received
Packets arrived out-of-order 10340.00% of the time.
This connection is receiver limited 39.30% of the time.
  Increasing the current receive buffer (2310.62 KB) will improve performance
This connection is sender limited 4.86% of the time.
  Increasing the current send buffer (1732.01 KB) will improve performance
This connection is network limited 55.84% of the time.

Client output attached

Original issue reported on code.google.com by rsanger...@gmail.com on 27 Aug 2013 at 11:18

Attachments:

GoogleCodeExporter commented 9 years ago
So it seems PktsIn has been deprecated in Web100 and Web10G.

The calculation for this is server side dup DupAcksIn/AckPktsIn.

When running Web100 2.5.30 for Linux 2.6.35(Possibly earlier also) the problem 
is that AckPkts in is no longer being returned. So this must be using a bogus 
value in the calculation.

Looking at the kernel patch
AckPktsIn == PktsIn - DataPktsIn;

I will work on a patch for this.

Original comment by rsanger...@gmail.com on 28 Aug 2013 at 12:06

GoogleCodeExporter commented 9 years ago
For context, M-Lab runs a patched 2.6.32 kernel with web100 2.5.27 (for kernel 
2.6.32).

The patch, exactly as applied, is at the link below:

http://git.planet-lab.org/?p=linux-2.6.git;a=blob;f=linux-2.6-690-web100.patch;h
b=refs/heads/rhel6-mlab

Original comment by stephen....@gmail.com on 28 Aug 2013 at 12:14

GoogleCodeExporter commented 9 years ago
Thanks for the patch your using, it appears to be the same problem.

Correction to my previous comment AckPktsIn has been deprecated, not PktsIn.

Original comment by rsanger...@gmail.com on 28 Aug 2013 at 12:42

GoogleCodeExporter commented 9 years ago
Does this patch to web100-util.c work?

Original comment by AaronMat...@gmail.com on 28 Aug 2013 at 12:52

Attachments:

GoogleCodeExporter commented 9 years ago
I've tested the patch locally and it appears to be working, commit at will.

Testing on my local machine I noticed my earlier statement wasn't correct 
(Web100 wont return AckPktsIn), it seems it will, however a warning will be 
printed every time saying accessing a deprecated variable.
In saying that it might not hold true all versions of Web100.
So I'm guessing that the M-Lab setup might not have AckPktsIn listed in there 
web_variables file (-f option) can you confirm this Stephen?

Either way we shouldn't be relying on deprecated variables.

Original comment by rsanger...@gmail.com on 28 Aug 2013 at 10:02

GoogleCodeExporter commented 9 years ago
I'm not sure I understand the question.

The file attached is the content of /proc/web100/header.  It includes AckPktsIn 
prefixed with '_'.  (Is that significant?)

And, ndtd is started with only these arguments:
   --log_dir $SLICERSYNCDIR/ --snaplog --tcpdump --cputime --multiple --max_clients=40

"-f" (  -f, --file variable_FN - specify alternate 'web100_variables' file) is 
not used.

Is there a problem? Or, a preferable set of options?

Original comment by stephen....@gmail.com on 31 Aug 2013 at 4:23

Attachments:

GoogleCodeExporter commented 9 years ago

Looks like the '_' prefix is what Web100 is adding to deprecated variables.

There is a default path used when '-f' isn't specified which is 
/usr/local/ndt/web100_variables. Or failing that run the server with -d added 
and it should print out the location:
Variables file = <location>

This file is a list of Web100 variables which the server collects and sends 
back to the client, check if AckPktsIn is in that list.

Original comment by rsanger...@gmail.com on 1 Sep 2013 at 2:08

GoogleCodeExporter commented 9 years ago
Aha!  

Yes, AckPktsIn & AckPktsOut are in that file. The file used by m-lab is the 
default included in the NDT source package (also attached).  These two also 
appear in /proc/web100/header with the '_' prefix.  Of the other variables 
prefixed with '_' in /proc/web100/header, none are in the web100_variables file.

Should those variable names be removed?  What are the consequences of altering 
it?  i.e. would any clients break?

Original comment by stephen....@gmail.com on 1 Sep 2013 at 10:42

Attachments:

GoogleCodeExporter commented 9 years ago

You should keep AckPktsIn & AckPktsOut on that list, although being deprecated 
in Web100 ndt still expects them to be there.

The reason I asked was that when I ran web100clt against your server the list 
of Web100 variables returned was missing AckPktsIn, so I wondered if this was 
missing from your web100_variables file. The interesting thing here is that 
AckPktsOut (which is also deprecated and seems identical to AckPktsIn) is being 
returned although AckPktsIn is not.

When I try the same thing locally I see AckPktsIn returned.

I don't know the reason for this difference in behavior.

Do you mind applying Aaron's fix_dup_ack_calculation.patch so we can see if 
this fixes the calculation. I wouldn't expect this to make AckPktsIn be 
returned - but it should fix the calculation.

Thanks for your help,

Original comment by rsanger...@gmail.com on 1 Sep 2013 at 11:51

GoogleCodeExporter commented 9 years ago
I was curious if you'd been able to test out the patch or not

Original comment by AaronMat...@gmail.com on 11 Sep 2013 at 7:58

GoogleCodeExporter commented 9 years ago
Is the patch for the server?  It is helpful to apply this patch to the server 
on mlab for testing?

Original comment by solt...@opentechinstitute.org on 11 Sep 2013 at 8:12

GoogleCodeExporter commented 9 years ago
Yep, it's a patch for the web100srv. It should hopefully fix the out-of-order 
packet percentage issue we were seeing on mlab.

Original comment by AaronMat...@gmail.com on 11 Sep 2013 at 8:32

GoogleCodeExporter commented 9 years ago
Applied:

http://ndt.iupui.mlab1.nuq0t.measurement-lab.org:7123/

Original comment by solt...@opentechinstitute.org on 11 Sep 2013 at 9:14

GoogleCodeExporter commented 9 years ago
I'm no longer seeing absurdly high out-of-order numbers, mainly because I'm not 
seeing any :) Ryan, could you test, and see if you see any. If it works for 
you, we're probably good for an -rc release.

Original comment by AaronMat...@gmail.com on 12 Sep 2013 at 12:30

GoogleCodeExporter commented 9 years ago
I have retested and it appears to be working as expected. I'm seeing 
out-of-order packets anywhere from 0% to 10% of the time over a couple of tests 
which looks correct. This matches with the Web100 variables being returned.

Looks like the patch worked.

Original comment by rsanger...@gmail.com on 12 Sep 2013 at 10:10

GoogleCodeExporter commented 9 years ago
Excellent, closing.

Original comment by AaronMat...@gmail.com on 13 Sep 2013 at 6:40