tatsuhiro-t / spdylay

The experimental SPDY protocol version 2, 3 and 3.1 implementation in C
http://tatsuhiro-t.github.io/spdylay/
MIT License
603 stars 102 forks source link

Timeout Config for Long-Pulling Sessions #48

Closed n0wa11 closed 11 years ago

n0wa11 commented 11 years ago

First, thanks for writing the C library, which is very efficient compared to libraries in higher-level languages.

I wonder what are your thoughts on the timeout config for the long-pulling websites, when used as a forward proxy.

When browsing G+, I did experience hanging on a certain requests. Then the website won't respond to my subsequent clicks on other links, unless I reload the webpage. Not sure if related.

tatsuhiro-t commented 11 years ago

The timeout values can be changed using --frontend-spdy-read-timeout and --frontend-write-timeout options. But this is the timeout for TCP connection and not SPDY stream. Shrpx does not have SPDY stream timeouts.

If other SPDY stream still works on the same TCP connection, then it is the problem of between browser and shrpx (possibly, DATA or WINDOW UPDATE frame handling) or the communication error between client and G+ site because G+ uses SSL/TLS, the connection is encrypted and shrpx does not know the actual contents it relays.

I remember that old chrome had the bug which does not send WINDOW UPDATE frame properly when SPDY proxy is used and it leads to dead lock. I know it was fixed in stable several months ago.

n0wa11 commented 11 years ago

I was trying to look into this issue again.

How to reproduce: upload any attachment that is greater than 1MB in Gmail.

Under Setup 1, the upload won't start. Subsequently, Chrome won't open gmail.com in a different tab, unless I switch off the proxy.

But, uploading has no problem in Setup 2 and 3.

All other browsing activities with Setup 1 have no problems.

Setup 1: chrome <--> shrpx 443 <--> squid 3128 Setup 2: chrome <--> stunnel 443 <--> squid 3128 Setup 3: chrome <--> squid 3128 (http_port)

Server: Ubuntu 12.04 amd64 Chrome Version 26.0.1410.65 (Mac OS Maintain Lion) shrpx 1.0.0 (latest commit) stunnel 4.56 (openssl 1.0.1 14 Mar 2012) Squid 3.3.4 (but ubuntu 12.04 stock version is the same)

tatsuhiro-t commented 11 years ago

I tried 1MB and 2MB attachment to gmail, but both work in setup1.

While I am not sure, but it may be related to MaxConnectionsPerProxy limit in Chrome. Here is the quote from http://spdylay.sourceforge.net/package_README.html#spdy-client-and-server-programs:

Note: At the time of this writing, Chrome 24 limits the maximum concurrent connections to the proxy to 32. And due to the limitation of socket pool handling in Chrome, it is quickly filled up if SPDY proxy is used and many SPDY sessions are established. If it reaches the limit, the new connections are simply blocked until existing connections are timed out. (See Chrome Issue 92244). The workaround is make the number of maximum connections high, say, 99, which is the highest. To do this, you need to change so called Policy setup. See Policy Templates for details how to change Policy setup on the platform you use. The Policy name we are looking for is MaxConnectionsPerProxy For example, if you are using Linux, follow the instruction described in Linux Quick Start and create /etc/opt/chrome/policies/managed/test_policy.json file with the following content and restart Chrome:

{ "MaxConnectionsPerProxy" :99 }

n0wa11 commented 11 years ago

I am not sure if it is due to the max connections, because I have no problems in browsing other websites (e.g. plus.google.com), except for gmail.com, after the problem is produced.

BTW, does the MaxConnectionsPerProxy option exist on the Mac Chrome.

I just set up a clean server of Ubuntu 12.04 and still reproduced the above-described problem.

Do you want to take a look at my server?

apt-get update && apt-get upgrade
apt-get install squid3

cat > /etc/squid3/squid.conf << EOF
http_access allow all
http_port 0.0.0.0:3128

EOF

service squid3 restart

# install shrpx as in README
# ...
# ...

mkdir /etc/shrpx

cat > /etc/shrpx/shrpx.conf << EOF
frontend=0.0.0.0,443
backend=127.0.0.1,3128
private-key-file=/etc/shrpx/www.my-domain-name.com.key
certificate-file=/etc/shrpx/www.my-domain-name.com.crt
spdy-proxy=yes
accesslog=yes

EOF
shrpx &

Prepare and host the pac file as follows

function FindProxyForURL(url, host)
{
    return 'HTTPS www.my-domain-name.com:443';
}
tatsuhiro-t commented 11 years ago

I recommend you to check that max connection is reached or not. Open the chrome://net-internals and open sockets. You can see the table and "handed out" column. By default the max connection to proxy is 32, so when the issue happens, check that those values are exceeding it.

MaxConnectionsPerProxy option exist on the Mac Chrome and you can see how to change that in http://dev.chromium.org/administrators/policy-templates

n0wa11 commented 11 years ago

I will later try on a Windows computer, as I seems to need a Mac Server to change the policy template.

Below is the display of the connection pools, when the problem is produced, with G+ opened. The connection number actually comes down quickly, once Chrome finishes loading other pages.

When I try to open gmail.com on a separate tab, I see shrpx passing a request to squid and squid getting a return for the request. But shrpx doesn't seem to return the content to Chrome. I wonder how I can further demonstrate the problem.

Name Handed Out Idle Connecting Max Max Per Group Generation transport_socket_pool 14 2 0 256 6 0 ssl_socket_pool 10 4 0 256 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool) 12 0 0 32 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->transport_socket_pool 0 0 0 32 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool 1 0 0 32 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool->transport_socket_pool 1 0 0 32 6 0 www.my-domain-name.com:443 (socks_socket_pool) 0 0 0 32 6 0 www.my-domain-name.com:443 (socks_socket_pool)->transport_socket_pool 0 0 0 32 6 0 www.my-domain-name.com:443 (ssl_socket_pool_for_proxies) 10 2 0 32 6 0

transport_socket_pool Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled sc.tynt.com:80 0 - 0 1 0 false false ssl/a248.e.akamai.net:443 0 - 4 0 0 false false ssl/chatenabled.mail.google.com:443 0 - 1 0 0 false false ssl/clients2.google.com:443 0 - 1 0 0 false false ssl/csi.gstatic.com:443 0 - 1 0 0 false false ssl/lh3.googleusercontent.com:443 0 - 1 0 0 false false ssl/mail.google.com:443 0 - 1 0 0 false false ssl/plus.google.com:443 0 - 1 0 0 false false ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false ssl/talkgadget.google.com:443 0 - 1 0 0 false false ssl/www.google.com.hk:443 0 - 1 0 0 false false ssl/www.google.com:443 0 - 1 0 0 false false www.google-analytics.com:80 0 - 0 1 0 false false

ssl_socket_pool Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled ssl/a248.e.akamai.net:443 0 - 0 4 0 false false ssl/chatenabled.mail.google.com:443 0 - 1 0 0 false false ssl/clients2.google.com:443 0 - 1 0 0 false false ssl/csi.gstatic.com:443 0 - 1 0 0 false false ssl/lh3.googleusercontent.com:443 0 - 1 0 0 false false ssl/mail.google.com:443 0 - 1 0 0 false false ssl/plus.google.com:443 0 - 1 0 0 false false ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false ssl/talkgadget.google.com:443 0 - 1 0 0 false false ssl/www.google.com.hk:443 0 - 1 0 0 false false ssl/www.google.com:443 0 - 1 0 0 false false

www.my-domain-name.com:443 (http_proxy_socket_pool) Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled ssl/a248.e.akamai.net:443 0 - 1 0 0 false false ssl/accounts.google.com.hk:443 0 - 0 0 0 false false ssl/accounts.google.com:443 0 - 0 0 0 false false ssl/accounts.youtube.com:443 0 - 0 0 0 false false ssl/ad.doubleclick.net:443 0 - 0 0 0 false false ssl/apis.google.com:443 0 - 0 0 0 false false ssl/chatenabled.mail.google.com:443 0 - 1 0 0 false false ssl/clients2.google.com:443 0 - 1 0 0 false false ssl/clients6.google.com:443 0 - 0 0 0 false false ssl/collector.githubapp.com:443 0 - 0 0 0 false false ssl/csi.gstatic.com:443 0 - 1 0 0 false false ssl/d2tf6sbdgil6xr.cloudfront.net:443 0 - 0 0 0 false false ssl/github.com:443 0 - 1 0 0 false false ssl/gmail.com:443 0 - 0 0 0 false false ssl/lh3.googleusercontent.com:443 0 - 0 0 0 false false ssl/lh4.googleusercontent.com:443 0 - 0 0 0 false false ssl/lh5.googleusercontent.com:443 0 - 0 0 0 false false ssl/lh6.googleusercontent.com:443 0 - 0 0 0 false false ssl/mail-attachment.googleusercontent.com:443 0 - 1 0 0 false false ssl/mail.google.com:443 0 - 1 0 0 false false ssl/plus.google.com:443 0 - 1 0 0 false false ssl/secure.gaug.es:443 0 - 0 0 0 false false ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false ssl/ssl.gstatic.com:443 0 - 0 0 0 false false ssl/talkgadget.google.com:443 0 - 1 0 0 false false ssl/www.google.com.hk:443 0 - 1 0 0 false false ssl/www.google.com:443 0 - 1 0 0 false false

www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false

www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool->transport_socket_pool Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false

www.my-domain-name.com:443 (ssl_socket_pool_for_proxies) Name Pending Top Priority Active Idle Connect Jobs Backup Job Stalled ssl/a248.e.akamai.net:443 0 - 0 1 0 false false ssl/chatenabled.mail.google.com:443 0 - 1 0 0 false false ssl/clients2.google.com:443 0 - 1 0 0 false false ssl/csi.gstatic.com:443 0 - 1 0 0 false false ssl/github.com:443 0 - 0 1 0 false false ssl/mail-attachment.googleusercontent.com:443 0 - 1 0 0 false false ssl/mail.google.com:443 0 - 1 0 0 false false ssl/plus.google.com:443 0 - 1 0 0 false false ssl/ssl.google-analytics.com:443 0 - 1 0 0 false false ssl/talkgadget.google.com:443 0 - 1 0 0 false false ssl/www.google.com.hk:443 0 - 1 0 0 false false ssl/www.google.com:443 0 - 1 0 0 false false

tatsuhiro-t commented 11 years ago

The socket numbers looks normal. I'll wait for the result of Windows PC.

Since the traffic between gmail is encrypted, after HTTP CONNECT, shrpx just sends and receives opaque data and does not know the actual traffic. So it just like traffic the other TLS sites make.

n0wa11 commented 11 years ago

I managed to increase the connection limit on a Windows PC. I repeated the steps as in the previous post. Nothing has changed.

Name Handed Out Idle Connecting Max Max Per Group Generation transport_socket_pool 0 0 0 256 6 0 ssl_socket_pool 0 0 0 256 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool) 27 0 0 99 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->transport_socket_pool 0 0 0 99 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool 1 0 0 99 6 0 www.my-domain-name.com:443 (http_proxy_socket_pool)->ssl_socket_pool->transport_socket_pool 1 0 0 99 6 0 www.my-domain-name.com:443 (socks_socket_pool) 0 0 0 99 6 0 www.my-domain-name.com:443 (socks_socket_pool)->transport_socket_pool 0 0 0 99 6 0 www.my-domain-name.com:443 (ssl_socket_pool_for_proxies) 19 8 0 99 6 0

tatsuhiro-t commented 11 years ago

so, still having a problem even after connection limit is raised. Today, I experienced similar problem with shrpx + chrome 28. In gmail, upload hangs (that's 2MB file) and a message like "cannot connect gmail" appeared below my icon. Surprisingly, after about 70 seconds, it started to work again. The stopped upload started and finished successfully. From the net-internal log, it seems that chrome was waiting for a response from the previous request, which took quite long time to reach the client. This happened twice, but it is not reproducible easily. It may be a limitation of gmail or chrome, but I'm not yet sure what the cause is. Did your connection to gmail work like my experience, in other words, after 1 minutes or so, did it start to work again?

n0wa11 commented 11 years ago

Once uploading starts, all the connections to gmail.com are blocked. As you observed, after 1~2 minutes, gmail will show the upload failed and the other connections to gmail.com can go through. But in my case, the uploading was never successful.

When the blocking happens, I try to load gmail.com in a different tab in Chrome. The server log of squid shows those requests, which means the requests are successful, because access.log is added only after the request is over.

I suspect shrpx has some domain-specific buffer/queue (whose size may be server-dependent) that were filled. BTW, uploading files to G+ also has the same problem.

tatsuhiro-t commented 11 years ago

shrpx does not have domain-specific buffer/queue. It only has uniform buffer size for all spdy session and flow control between shrpx and chrome, which is 64KB by default. Since the connection between chrome and gmail is tunneled through HTTP connect, the tunneled connection is also SPDY/3 flow controlled. For tunneled connections, like gmail or g+ sites, shrpx does not know what chrome is doing because they are encrypted. All requests from chrome is processed as "upload data" by shrpx. If flow control is broken, just browsing g+ causes problems (it once was, because of chrome bug). So I don't think shrpx's flow control is a cause of bug, but you can disable the flow control and see it remedy the situation. To disable flow control, edit src/shrpx_ssl.cc, line 186. Change "spdy/3" to "spdy/2" like this:

const char *protos[] = { "spdy/2", "spdy/2", "http/1.1" };

spdy/2 does not have flow control, so this effectively disable flow control.

n0wa11 commented 11 years ago

It does seem to have solved the problem.

What is your suggestion for a production deployment?

tatsuhiro-t commented 11 years ago

OK, so the problem is the flow control of shrpx. I'll review the code to find the cause. For production deployment, as a workaround, using spdy/2 is a safe bet.

tatsuhiro-t commented 11 years ago

It seems that this is chrome's bug, https://code.google.com/p/chromium/issues/detail?id=242288, which does not resume uploads indefinitely, once the stream is stalled. I checked the net-internal log when the issue happened and saw that stream was stalled but shrpx sent window_update and window was back to full (64KB) and chrome logged that stream was unstalled, but then chrome did not resume upload.

t=1371649924235 [st=199616]  SPDY_SESSION_STREAM_STALLED_BY_STREAM_SEND_WINDOW
                             --> stream_id = 21
t=1371649924235 [st=199616]  SPDY_SESSION_STREAM_STALLED_BY_STREAM_SEND_WINDOW
                             --> stream_id = 21
t=1371649924236 [st=199617]  SPDY_SESSION_RECEIVED_WINDOW_UPDATE_FRAME
                             --> delta = 65536
                             --> stream_id = 21
t=1371649924236 [st=199617]  SPDY_STREAM_UPDATE_SEND_WINDOW
                             --> delta = 65536
                             --> stream_id = 21
                             --> window_size = 65536
t=1371649924236 [st=199617]  SPDY_STREAM_FLOW_CONTROL_UNSTALLED
                             --> stream_id = 21

(No send data from chrome for stream_id=21)

The bug was fixed in M29, which is dev(unstable) version now.

n0wa11 commented 11 years ago

Thanks for looking into it so quickly.

I wonder what is so special with gmail/g+ uploading, not in other websites.

Also I assume the shrpx client deals with the follow control correctly?

tatsuhiro-t commented 11 years ago

Actually, I'm not entirely sure why other sites are not affected. Maybe they are fast enough not to be bothered by flow control.

If you have time, revert the change in shrpx_ssl.cc like this

const char *protos[] = { "spdy/3", "spdy/2", "http/1.1" };

and use dev channel chrome 29 and see the problem still exists?

I think shrpx client does not have apparent flow control bug at the moment.

n0wa11 commented 11 years ago

You are right. The problem is gone on Chrome 29.0.1541.0 dev, with the original code.