jech / polipo

The Polipo caching HTTP proxy
http://www.pps.jussieu.fr/~jch/software/polipo/
MIT License
1.81k stars 355 forks source link

Incorrect error handling with SOCKS proxies #14

Closed jech closed 10 years ago

jech commented 10 years ago

Not verified. See

http://mid.gmane.org/CAEh0oq-gde_mEHSyxn=Kti-Zi-ZTE3-D5m-_WY=8B_WHeZSLrQ@mail.gmail.com

jech commented 10 years ago

Probably the same issue as

http://mid.gmane.org/em39b4f275-c11c-42b1-8d94-44fec158dd16@gaming

xnoreq commented 10 years ago

It's me from the email in the previous comment.

I am not using a parent proxy.

edit: The bug does not seem to appear if I disable using polipo as "SSL proxy" in firefox.

jech commented 10 years ago

If I set foxyproxy to use polipo for all URLs, it seems that at least on youtube firefox then tries to talk something other than http(s) to the proxy which triggers the CPU usage bug.

Interesting. Unfortunately, I'm unable to reproduce your issue, even using tor.

If polipo only handles http(s) requests, then shouldn't it simply drop the connection for other protocols or generally malformed requests?

It usually does:

$ (echo "Not HTTP"; echo) | nc localhost 8123
HTTP/1.1 400 Error in request line
Connection: close
Date: Wed, 14 May 2014 17:25:18 GMT
Content-Type: text/html
Content-Length: 417
Expires: 0
Cache-Control: no-cache
Pragma: no-cache

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head>
<title>Proxy error: 400 Error in request line.</title>
</head><body>
<h1>400 Error in request line</h1>
<p>The following error occurred:<br><br>
<strong>400 Error in request line</strong></p>
<hr>Generated Wed, 14 May 2014 19:25:18 CEST by Polipo on <em>ijon.pps.jussieu.fr:8123</em>.
</body></html>
xnoreq commented 10 years ago

I am sorry, I edited my previous reply because I was eventually able to reproduce the CPU usage bug even with https.

So at the moment only pure http works for me. As soon as I enable polipo for https and play some youtube videos, do some skipping in the videos .. the CPU usage bug will appear again.

xnoreq commented 10 years ago

Also, I cannot reproduce the problem at all with version 1.0.4.

jech commented 10 years ago

Also, I cannot reproduce the problem at all with version 1.0.4.

That's normal. 1.0.4 didn't go through the SOCKS proxy when tunnelling.

Are you using the Cygwin or the native Windows port of Polipo?

xnoreq commented 10 years ago

Initially I was running native (mingw compiled) but switched to cygwin because of the CPU bug ... then I noticed the same problem with cygwin. I didn't switch back, so all my latest testing is on cygwin x86 on a Windows 7 machine.

I also tried polipo-20140107 with no problem, but with polipo-1.1.0 the bug appears as soon as I resume a paused youtube video.

I can provide backtraces or even gcov files that show how often a line was executed. Just let me know what you need.

jech commented 10 years ago

Could you please apply the following patch and describe what happens when you reproduce the issue?

diff --git a/mingw.c b/mingw.c
index c4d5032..dd4f992 100644
--- a/mingw.c
+++ b/mingw.c
@@ -297,6 +297,7 @@ int win32_read_socket(SOCKET fd, void *buf, int n)
     if(rc == SOCKET_ERROR) {
         set_errno(WSAGetLastError());
     }
+    fprintf(stderr, "recv: %d -> %d (%d)\n", n, rc, rc <= 0 ? 0 : errno);
     return rc;
 }
jech commented 10 years ago

1.0.4 didn't go through the SOCKS proxy when tunnelling.

That's not true, of course. Are you positive you don't see the issue with 20140107? I don't see anything changed in the tunnelling code between that version and 1.1.0. (There's been a change after 1.1.0, but it's unlikely to cause your issue).

xnoreq commented 10 years ago

I applied your patch and tried a fresh build with mingw.

Scenario is the same as above: start polipo, resume paused youtube video, wait a few secs .. BUG. Here's the error output, it starts out like this:

Established listening socket on port 8123.
recv: 4096 -> 207 (10047)
recv: 4095 -> 517 (10056)
recv: 4095 -> 150 (10056)
recv: 3579 -> 87 (10056)
recv: 3492 -> 1903 (10056)
recv: 4095 -> 621 (10056)
recv: 4096 -> 243 (10056)
recv: 4095 -> 517 (10056)
recv: 4095 -> 133 (10056)
recv: 3579 -> 47 (10056)
recv: 3532 -> 1031 (10056)
recv: 4095 -> 1022 (10056)
recv: 2501 -> 27 (10056)
recv: 2474 -> 0 (0)

and then it goes crazy like outputting in a loop:

recv: 4095 -> -1 (0)
recv: 4095 -> -1 (0)
recv: 4095 -> -1 (0)

... after about 660 of those lines there is this:

recv: 4096 -> 239 (10054)

and then it goes back to:

recv: 4095 -> -1 (0)

for like 6000 lines, then:

recv: 4095 -> 517 (10054)

and the spam of the "4095 -> -1 (0)" line continues. This produced a 12 MB log file in just seconds after the bug started.

xnoreq commented 10 years ago

Are you positive you don't see the issue with 20140107?

Yes, I do however see these messages e.g. when skipping in a youtube video:

Couldn't read from server: Connection reset by peer
Couldn't read from server: Connection reset by peer
Couldn't read from server: Connection reset by peer

which I don't see with version 1.0.4. Also, whenever those messages appear I do get a short CPU spike, sometimes as high as 50% on a single core, but with 20140107 these don't seem to ever cause an endless loop (100% on a core) like the latest versions do.

jech commented 10 years ago

Does this fix it?

diff --git a/tunnel.c b/tunnel.c
index 60b8418..c7192ba 100644
--- a/tunnel.c
+++ b/tunnel.c
@@ -482,8 +482,8 @@ tunnelRead2Handler(int status,
                    FdEventHandlerPtr event, StreamRequestPtr request)
 {
     TunnelPtr tunnel = request->data;
-    if(status && status != -EPIPE && status != -ECONNRESET) {
-        if(status < 0)
+    if(status) {
+        if(status && status != -EPIPE && status != -ECONNRESET)
             do_log_error(L_ERROR, -status, "Couldn't read from server");
         tunnel->flags |= TUNNEL_EOF2;
         goto done;
xnoreq commented 10 years ago

Does this fix it?

Looks like it. With cygwin the above message seems to be now replaced by these (not a great deal because they don't appear very often):

Couldn't read from server: Unknown error -1

But as far as I can tell no high CPU usage with that patch even with mingw! Thanks.

jech commented 10 years ago

Fixed in c61964babb6ac8fe7c83e930a389b92fb66c2251. Thanks a lot for your help.