richb-hanover / ndt

Automatically exported from code.google.com/p/ndt
Other
1 stars 0 forks source link

NDT reports erroneous results with 10G interface #83

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This may already be fixed in trunk. Basically boils own to an issue in 
test_s2c_clt.c line 32 where a uint32_t should be uint64_t. See below for full 
details:

To: Nat Stoddard <ntstoddard@lbl.gov>
CC: ndt-users@internet2.edu
Subject: Re: Using NDT with 10 gigabit interfaces

Nat;

This is good.  As I thought, the CPU isn't the bottleneck.

The connection is sender limited, meaning its' not network or receiver 
limited.  The send/receive buffers don't limit the throughput.

The pkt-pair timing say OC-48 to 10 GE so that's good.

So lets look at the raw numbers.
    CurMSS = 1448 (normal Ethernet frame)
    Data Pkts Out = 7873701
    Data Bytes Out = 2,147,483,647
    Timesec = 10.00 sec

This is suspicious.
Bytes sent should be: 7,873,701 x 1448 = 11,401,119,048
Speed calculates to 9.1 Gbps  So this is what the server acheived, but 
not what is logged by the client.

But the server report 2.2 GB which converts to 1.7 Gbps. which is what 
the spd: variable says.

I wonder if this is just a variable overflow error.  The DataBytesOut 
variable is a 32 bit counter.  There is an HCDataOctetsOut variable (a 
64 bit counter)

So, this is a problem that may impact some of the analysis messages. 
line 1082 in web100srv.c sets the variable spd based on the DataBytesOut 
value.  Either the code should grab the HCDataOctetsOut value or a test 
should be made and if s2cspd is greater that something like 5 Gbps, use 
the HC value otherwise use the existing value.

I'm looking at the code and it show the server is able to right log 
messages that might help here.  Restart the server with 1 or 2 debug 
flags (-d or -dd) and post the output here.

Got it !!!!!!  While the DataByteOut vs HCDataOctetOut issue noted above 
is a potential bug, it is not causing this bug.

The file test_s2c_clt.c is the culprit.  The data read subroutine 
declares the byte counter as an uint32_t.  This is too small for the 
amount of data we are getting at 10 Gbps, and it is overflowing. 
Changing that to a uint64_t variable should fix this problem.  The 
server uses double, so it doesn't overflow.

So change line 32 of test_s2c_clt.c  s/uint32_t/uint64_t
and recompile.  See if that fixes this bug.

Same with the Java client, line 1253 of the Tcpbw100.java file declares 
bytes as an int, this should be a double to prevent counter overflows at 
10 Gbps.

Rich

On 05/16/2013 01:30 PM, Nat Stoddard wrote:
> Hi Rich,
> Thank you for taking notice of this.
> 1.  I notice this behavior with both the web and the command line
> clients.  I only have the command line client to refer to when I am
> testing between my two NDT servers.  I get numbers very close to the
> same in both directions:
> server1 to server2 outbound 8517.57 Mb/s, inbound 2164.96 Mb/s
> server2 to server1 outbound 9101.33 Mb/s, inbound 2245.74 Mb/s
>
> 2. Top reports the CPU load on the server as high as 49%.  The client
> goes up to 45%.
>
> 3.  I have pasted the web100clt -ll output below:
>
> $ web100clt -n lblnet-test.lbl.gov -ll
> Testing network path for configuration and performance problems  --
> Using IPv4 address
> Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
> checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
> running 10s outbound test (client to server) . . . . .  9101.33 Mb/s
> running 10s inbound test (server to client) . . . . . . 2245.74 Mb/s
> The slowest link in the end-to-end path is a 2.4 Gbps OC-48 subnet
> Information [S2C]: Packet queuing detected: 75.37% (remote buffers)
> Server 'lblnet-test.lbl.gov' is not behind a firewall. [Connection to
> the ephemeral port was successful]
> Client is not behind a firewall. [Connection to the ephemeral port was
> successful]
>
>      ------  Web100 Detailed Analysis  ------
>
> Web100 reports the Round trip time = 10.54 msec;the Packet size = 1448
> Bytes; and
> No packet loss was observed.
> This connection is receiver limited 1.67% of the time.
> This connection is sender limited 98.30% of the time.
>
>      Web100 reports TCP negotiated the optional Performance Settings to:
> RFC 2018 Selective Acknowledgment: ON
> RFC 896 Nagle Algorithm: ON
> RFC 3168 Explicit Congestion Notification: OFF
> RFC 1323 Time Stamping: ON
> RFC 1323 Window Scaling: ON; Scaling Factors - Server=10, Client=10
> The theoretical network limit is 104855.13 Mbps
> The NDT server has a 16384 KByte buffer which limits the throughput to
> 12148.82 Mbps
> Your PC/Workstation has a 12269 KByte buffer which limits the throughput
> to 9097.53 Mbps
> The network based flow control limits the throughput to 13645.63 Mbps
>
> Client Data reports link is '  8', Client Acks report link is '  9'
> Server Data reports link is '  9', Server Acks report link is '  9'
> Packet size is preserved End-to-End
> Server IP addresses are preserved End-to-End
> Client IP addresses are preserved End-to-End
> CurMSS: 1448
> X_Rcvbuf: 87380
> X_Sndbuf: 16777216
> AckPktsIn: 267378
> AckPktsOut: 0
> BytesRetrans: 0
> CongAvoid: 0
> CongestionOverCount: 0
> CongestionSignals: 0
> CountRTT: 267379
> CurCwnd: 18844272
> CurRTO: 210
> CurRwinRcvd: 12521472
> CurRwinSent: 6144
> CurSsthresh: 2147483647
> DSACKDups: 0
> DataBytesIn: 0
> DataBytesOut: 2147483647
> DataPktsIn: 0
> DataPktsOut: 7873701
> DupAcksIn: 0
> ECNEnabled: 0
> FastRetran: 0
> MaxCwnd: 18844272
> MaxMSS: 1448
> MaxRTO: 212
> MaxRTT: 12
> MaxRwinRcvd: 12563456
> MaxRwinSent: 6144
> MaxSsthresh: 0
> MinMSS: 1448
> MinRTO: 201
> MinRTT: 0
> MinRwinRcvd: 6144
> MinRwinSent: 5792
> NagleEnabled: 1
> OtherReductions: 0
> PktsIn: 267378
> PktsOut: 7873701
> PktsRetrans: 0
> RcvWinScale: 10
> SACKEnabled: 3
> SACKsRcvd: 0
> SendStall: 0
> SlowStart: 13012
> SampleRTT: 10
> SmoothedRTT: 10
> SndWinScale: 10
> SndLimTimeRwin: 167989
> SndLimTimeCwnd: 2531
> SndLimTimeSender: 9878037
> SndLimTransRwin: 4723
> SndLimTransCwnd: 42
> SndLimTransSender: 4766
> SndLimBytesRwin: 352300576
> SndLimBytesCwnd: 1710336
> SndLimBytesSender: 2147483647
> SubsequentTimeouts: 0
> SumRTT: 2817061
> Timeouts: 0
> TimestampsEnabled: 1
> WinScaleRcvd: 10
> WinScaleSent: 10
> DupAcksOut: 0
> StartTimeUsec: 6541
> Duration: 10050267
> c2sData: 8
> c2sAck: 9
> s2cData: 9
> s2cAck: 9
> half_duplex: 0
> link: 100
> congestion: 0
> bad_cable: 0
> mismatch: 0
> spd: 1709.69
> bw: 104855.13
> loss: 0.000000000
> avgrtt: 10.54
> waitsec: 0.00
> timesec: 10.00
> order: 0.0000
> rwintime: 0.0167
> sendtime: 0.9830
> cwndtime: 0.0003
> rwin: 95.8516
> swin: 128.0000
> cwin: 143.7704
> rttsec: 0.010536
> Sndbuf: 16777216
> aspd: 0.00000
> CWND-Limited: 96729.00
> minCWNDpeak: -1
> maxCWNDpeak: -1
> CWNDpeaks: -1
>

Original issue reported on code.google.com by AndrewRL...@gmail.com on 17 Jun 2013 at 5:51

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
line 1296 of the Tcpbw100.java file declares bytes as a long, this should be a 
double to prevent counter overflows at 10 Gbps.

Original comment by ntstodd...@lbl.gov on 17 Jun 2013 at 7:35

GoogleCodeExporter commented 9 years ago
Test_s2c_clt.c already fixed in trunk (in line 32 uint32_t changed for uint64_t)

Suggestion from comment #3 explained at issue #84
https://code.google.com/p/ndt/issues/detail?id=84

Original comment by smale...@soldevelo.com on 17 Feb 2014 at 1:25