simplegy / httplib2

Automatically exported from code.google.com/p/httplib2
0 stars 0 forks source link

Slow performance due to Nagle algorithm (and solution) #28

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

First of all, thanks for a great python module.

During some tests I made, I measured some very long delays in replies to
HTTP POST using httplib2.
It turns out the delays are caused by Nagle algorithm, and that performance
can be greatly improved by setting TCPNODELAY, which disables Nagle's
algorithm.
See
http://developers.slashdot.org/comments.pl?sid=174457&threshold=1&commentsort=0&
mode=thread&cid=14515105

for more details.

In a Linux environment, the extra delay is 40ms per request. For Windows
it's 200ms.

This delay can be eliminated with the following patch:

-------------------------------------------------------------
import socket

#
# Socket wrapper to enable socket.TCP_NODELAY
#
realsocket = socket.socket
def socketwrap(family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0):
    sockobj = realsocket(family, type, proto)
    sockobj.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
    return sockobj

socket.socket = socketwrap
-------------------------------------------------------------

Attached is a small benchmark to demonstrate the improvement.
The results are:

httplib2 WITHOUT TCP_NODELAY: 100 runs in 4.085498 secs (24.476821
runs/sec, 40854.978561 microsecs/run)
httplib2 WITH TCP_NODELAY: 100 runs in 0.253862 secs (393.914953 runs/sec,
2538.619041 microsecs/run)

or in other words, a 94% performance improvement!

Note that the performance improvement will be noticed mainly in POST
requests because they are implemented with two separate socket writes.
Improvement for GET requests, with a single write, will be much smaller.

Thanks,
Ron

Original issue reported on code.google.com by ravr...@gmail.com on 10 Jun 2008 at 12:58

Attachments:

GoogleCodeExporter commented 9 years ago
I can't get this to repeat, with and without the patch the tests run in the same
amount of time. Is this a valid issue?

Original comment by joe.gregorio@gmail.com on 16 Jul 2009 at 7:18

GoogleCodeExporter commented 9 years ago
I tested again my benchmark, and the results are still valid.

First make sure to define a URL on http server that will ignore POST parameters 
and
return an empty page with success. I used Apache on Linux and just defined an 
Alias
and Directory directives with an empty file.

With a network sniffer you'll see that in first part of test the client sends 
the
http headers in one packet, waits for TCP ACK and then sends the POST params. 
The
problem is that the TCP ACK is sent after 40ms (!).
In second part of test the client does not wait for TCP ACK before sending POST
params, and therefore the performance is significantly higher (>90%).

What results do you get with the benchmark?

Ron

Original comment by ravr...@gmail.com on 20 Jul 2009 at 5:57

GoogleCodeExporter commented 9 years ago
We encountered the same problem. Using a persistent connection, we had a 5x 
delay
compared to non-persistent. Looking at a packet capture, there was a 40ms delay 
in a
Linux environment. This was to a couchdb database with 500 POSTs.

I added a similar patch to the HTTPConnectionWithTimeout.connect method and
persistent connections run in the same time as non-persistent.

I suggest that maybe a "low_latency" option be added to the connection 
constructor.

Following simple patch is against 0.4:

--- __init__.py.original    2009-08-18 12:00:10.000000000 +0200
+++ __init__.py 2009-08-18 12:09:44.000000000 +0200
@@ -696,6 +696,7 @@
                     self.sock.setproxy(*self.proxy_info.astuple())
                 else:
                     self.sock = socket.socket(af, socktype, proto)
+                    self.sock.setsockopt(socket.IPPROTO_TCP, 
socket.TCP_NODELAY, 1)
                 # Different from httplib: support timeouts.
                 if self.timeout is not None:
                     self.sock.settimeout(self.timeout)
@@ -732,6 +733,7 @@
             sock.setproxy(*self.proxy_info.astuple())
         else:
             sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+            self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
         if self.timeout is not None:
             sock.settimeout(self.timeout)
         sock.connect((self.host, self.port))

Original comment by rsw...@gmail.com on 18 Aug 2009 at 10:15

GoogleCodeExporter commented 9 years ago
I'd really like to have this fixed. It affects the performance of other 
libraries
using httplib2, namely couchdb-python. According to my experiments, not only 
HTTP
POSTs are affected, but other methods too.

Here's an updated patch for the current hg tip. All unit tests pass after 
patching.

diff -r cb8ddb07ec19 python2/httplib2/__init__.py
--- a/python2/httplib2/__init__.py      Mon Dec 28 15:51:12 2009 -0500
+++ b/python2/httplib2/__init__.py      Wed Feb 03 10:42:09 2010 +0200
@@ -730,6 +730,7 @@
                     self.sock.setproxy(*self.proxy_info.astuple())
                 else:
                     self.sock = socket.socket(af, socktype, proto)
+                    self.sock.setsockopt(socket.IPPROTO_TCP, 
socket.TCP_NODELAY, 1)
                 # Different from httplib: support timeouts.
                 if has_timeout(self.timeout):
                     self.sock.settimeout(self.timeout)
@@ -767,6 +768,7 @@
             sock.setproxy(*self.proxy_info.astuple())
         else:
             sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+            sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

         if has_timeout(self.timeout):
             sock.settimeout(self.timeout)
diff -r cb8ddb07ec19 python3/httplib2/__init__.py
--- a/python3/httplib2/__init__.py      Mon Dec 28 15:51:12 2009 -0500
+++ b/python3/httplib2/__init__.py      Wed Feb 03 10:42:09 2010 +0200
@@ -718,6 +718,7 @@
                     self.sock.setproxy(*self.proxy_info.astuple())
                 else:
                     self.sock = socket.socket(af, socktype, proto)
+                    self.sock.setsockopt(socket.IPPROTO_TCP, 
socket.TCP_NODELAY, 1)
                 # Different from httplib: support timeouts.
                 if has_timeout(self.timeout):
                     self.sock.settimeout(self.timeout)
@@ -753,6 +754,7 @@
             sock.setproxy(*self.proxy_info.astuple())
         else:
             sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+            sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
         if has_timeout(self.timeout):
             sock.settimeout(self.timeout)
         sock.connect((self.host, self.port))

Original comment by akhern on 3 Feb 2010 at 8:42

GoogleCodeExporter commented 9 years ago
FYI, this is now fixed. See issue 91.

Original comment by akhern on 3 Feb 2010 at 7:52