tor2web / Tor2web

Tor2web is an HTTP proxy software that enables access to Tor Hidden Services by mean of common web browsers
https://www.tor2web.org
GNU Affero General Public License v3.0
699 stars 177 forks source link

Connection Level Optimization #36

Closed fpietrosanti closed 11 years ago

fpietrosanti commented 12 years ago

This ticket is to research, inspect and dump ideas on Connection Level (TCP Level) Optimizations that can be done to speedup Tor2web.

Some possible area of improvements are:

fpietrosanti commented 12 years ago

Currently Tor2web make a new SOCKS and new TCP and new HTTP request for each object it is going to retrieve and this has been found to be inefficient.

Below a chat about possible improvements:

naif A question from the latency/round-trip optimization point of view If a web client need to reach a TorHS and download a web pages with 20-30 elements to be downloaded (images, css, js, etc) Are there any difference if the web-client make a new SOCKS requests for each elements to be downloaded vs using a single SOCKS + single TCP socket to download all the elements? 10:27 armadev with the single stream, the client can pipeline its requests, and save round-trips separate streams means each request has to wait for the begin cell and the connected cell before sending its request so, it depends how the single stream is used. 10:29 naif armadev: but this will happen also if the connections goes to a single destination TorHS? armadev: that's the case for Tor2web currently in development armadev: and now it's stateless, it means 1 request from http client on the internet = 1 socks request = 1 http request armadev: the main tor2web performance optimizations requirements is to reduce the latency 10:31 naif armadev: so, i was wondering if making a single SOCKS requests, keeping it alive keep-alive and sending all the requests (multiple connect requests) over this single SOCKS connection, could provide a benefit while downloading many objects from a web page 10:32 naif armadev: respect to making multiple SOCKS requests (i am not speaking about using a single TCP socket with HTTP pipelining, just from the SOCKS level) armadev: if tor2web download https://5m4rylprkig4swgg.tor2web.org/reports/2012/T-Mobile_USA_WebGuard.html there are tons of elements to web downloaded and the current code make tons of SOCKS requests armadev: would we have a significant benefit in round-trip/latency making a single SOCKS connection and tunnelling multiple TCP connections over this single SOCKS channel? 10:40 armadev yes. but don't tunnel them. pipeline them. send the http get, and then another, and so on. then get the answers when they arrive. see http 1.1 (if you really want to tunnel them / bundle them, see how spdy does it.) 10:41 naif armadev: mmm, you mean using http pipelining over a single http/tcp socket? 10:49 armadev yes spdy might even do better though. pipelining doesn't let you fetch the images until you' ve seen the html. but spdy could know you meant to and just bundle them. but the server side needs to know how to talk spdy too. 10:54 naif armadev: also i see that "http pipelining" |= "using a single connection with keep-alive to make request-response-request-response loop" armadev: pipelining = "a single connection to request multiple objects within a single request" = the server must support it armadev: http simpler keep-alive = "a single connection where we request an object, wait for answer, then request other object, then wait again for answer" 10:55 armadev keepalive will be better than nothing. but nowhere near as good as pipelining. another way to improve round-trips is to use the optimistic data feature in tor basically, if the thing talking to the socks proxy sends its "http get /index.html" before the socks handshake finishes, tor will bundle that piece of the data in the begin cell, in hopes that it succeeds. (well, not actually in the begin cell. but in a data cell right after it.) this requires a change to the socks app. but maybe that is tor2web? naif armadev: sure So, to summarize the area of improvements would be reasonably: a) http keep-alive to use a single socket (in case server does not support http pipelining) b) http pipelining c) optimistic data feature at socks level those should provide at various level improvements from latency/round-trip point of view armadev https://trac.torproject.org/projects/tor/ticket/1849 optimistic data is designed as a "client and exit relay" thing. there's no reason it shouldn't work for hidden services too, but this is the first time i've thought about it in the context of hidden services. 11:03 naif armadev: do you see possible tor's related code improvements to make it working to TorHS? (Considering also Tor2web mode activated) ? 11:03 armadev i think it should just work. somebody should try it. if it doesn't work, we should fix it see also https://trac.torproject.org/projects/tor/ticket/3875

fpietrosanti commented 12 years ago

The Twisted HTTP Client already support:

http://twistedmatrix.com/documents/current/web/howto/client.html#auto6

We may easily integrate HTTP keep-alive client handling over a connection pool within Tor2web.

When we will also have statistics on "top accessed websites" of #13 we may keep always an connection-pool available to that hosts to be able to quickly serve the content.

evilaliv3 commented 11 years ago

Tor2web connection to HSs is now made in keep-alive mode and connections are cached in a Pool regardless clients support for keep-alive.