zero-copy mode for receiver too

vienlee / iperf

Automatically exported from code.google.com/p/iperf

Other

0 stars 0 forks source link

zero-copy mode for receiver too #106

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

We have -Z for the sender. We should have it for the receiver too.

There is some nice sample code here:

http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/

Original issue reported on code.google.com by bltier...@es.net on 10 Nov 2013 at 5:32

GoogleCodeExporter commented 9 years ago

fixed. closing

Original comment by bltier...@es.net on 26 Nov 2013 at 1:53

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Did we fix this?  I don't think so.  We do have some receiver-side performance 
improvements, but not involving splice().

Original comment by jef.posk...@gmail.com on 26 Nov 2013 at 2:00

GoogleCodeExporter commented 9 years ago

I assumed that was the fix you committed today, since the CPU went down some 
much.

(OK, I should have actually looked at the change log! )

So I guess not. I'll reopen it.

Original comment by bltier...@es.net on 26 Nov 2013 at 2:04

GoogleCodeExporter commented 9 years ago

closed in error. Reopening

Original comment by bltier...@es.net on 26 Nov 2013 at 2:08

Changed state: New

GoogleCodeExporter commented 9 years ago

By the way, splice() is not a guaranteed win because it takes two splice() 
calls to replace the previous single read() call.  No bytes get moved but 
still, extra syscall.  Have to benchmark it carefully.

Original comment by jef.posk...@gmail.com on 26 Nov 2013 at 2:28

GoogleCodeExporter commented 9 years ago

Is it not possible to increase server receive performance by:

- only reading the first few bytes from the socket (ie containing the per 
packet iperf proto info) & 'skipping' the rest by advancing the fd read pointer 
...reduces CPU & memory resources especially for large packets

&

- repeated reads of the socket to clear any back-log of received packets before 
running the timer/scheduler routine; helps especially for small packet test 
cases.

&

- provide some instructions on how the user can increase the amount of 
buffering available to the kernel/NIC card driver (means the CPU can be away 
from iperf receive for longer before drops due to no buffering occur).

Original comment by CharlesA...@gmail.com on 27 Nov 2013 at 6:57

GoogleCodeExporter commented 9 years ago

NOTE: profiling the server code with 'perf' indicates ~25 - 30% of the cycles 
are spent in memcpy; next highest item is @ ~5% (some piece of kernel code, not 
specific iperf code).

Original comment by CharlesA...@gmail.com on 28 Nov 2013 at 12:34

GoogleCodeExporter commented 9 years ago

Grab this.

Original comment by bmah@es.net on 8 Jan 2014 at 10:12