snoyberg / http-enumerator

HTTP client package with enumerator interface and HTTPS support.
27 stars 9 forks source link

Fix handling of chunked data when rawBody == True. #37

Closed erikd closed 13 years ago

erikd commented 13 years ago

As I posted to the haskell web dev mailing list, requests that used rawBody == True and resulted in data with Transfer-encoding: chunked would hang (ie produce no data) and then timeout.

This patch fixes this problem for me.

erikd commented 13 years ago

Hang on, not quite right. Let me get back to you.

snoyberg commented 13 years ago

I'm not sure why this would change the behavior in any meaningful way. Do you have a minimal reproducing test case that demonstrates the problem?

erikd commented 13 years ago

Yes, the code in the patch actually didn't work correctly.

I do however have a test case; the code is below. Compile the program and run it as:

 ./program http://www.google.com/

The program should write the HTTP response headers and the response body to the file. However, since the google server sends the data as Transfer-encoding chunked and HE.rawBody is True, the program just hangs.

I need this for a HTTP proxy. For chunked data, the proxy should just pass through all the data it receives. For chunked data, that means parsing the chunks to find the end of the chunked data, but then passing them on as chunked data rather than dechunked version. I am currently working in a rechunker Enumeratee.

Program in this gist : https://gist.github.com/1297809

snoyberg commented 13 years ago

OK, I've determined the problem: since we're just passing through the stream, nothing in the pipeline is checking when the end of the response is. In fact, if you wait long enough, the code you sent will work, once the server severs the connection.

I've just pushed some commits that I believe solve the issue. Can you test it out and let me know?

erikd commented 13 years ago

Thats a definite improvement! Thanks!

Two small issues, you have debug output due to this line:

 liftIO $ print $ S8.pack $ showHex len "\r\n"

and the "import Numeric (showHex)" needs to be moved before the line:

 #if !MIN_VERSION_base(4,3,0)

I sent a pull request containing a fix for these two.

Only remaining question is how I'm supposed to continue wrapping my head around the concept of enumerators if you fix the bugs I find. :-)

erikd commented 13 years ago

Sorry, something still not quite right.

Your fix (with my minor adjustments above) make the program in the Gist I posted works. However, if I then put the http-enumerator into my proxy and then use http-enumerator to get the same data via the proxy, the client gets

HttpParserException "Chunk header"

Let me debug this a little further.

snoyberg commented 13 years ago

Where is the code for your proxy?

erikd commented 13 years ago

Sorry, been away for 3 days camping.

I'll work on it today and if I'm still having problems I'll put up a public repo.

erikd commented 13 years ago

The repo with the proxy is here (pretty raw):

https://github.com/erikd/simple-web-proxy.git

The problem with

HttpParserException "Chunk header"

only occurs when accessing data via a squid proxy and the server sends the data as transfer-encoding chunked.

To reproduce the problem grab the repo above, do "make" and run (assumes a squid proxy listening on port 3128 of a machine called "squid"):

./proxy-test http://squid:3128/ http://www.google.com/

I suspect that this s actually because on transfer-encoding:chunked, squid de-chunks the data and closes the connection at the end.

erikd commented 13 years ago

A fix for this is in issue #38.