snoyberg / http-enumerator

HTTP client package with enumerator interface and HTTPS support.
27 stars 9 forks source link

Chunked Encoding #24

Closed wuzzeb closed 13 years ago

wuzzeb commented 13 years ago

I am using http-enumerator to talk to couch db, and occasionally during testing I would get

ParseError {errorContexts = ["HTTP status line"], errorMessage = "Failed reading: takeWith"}

After investigating using wireshark to look at the tcp dumps, I realized the error occured whenever the previous response was a chunked encoding. (I am reusing the same Manager for multiple queries.) After looking at the code, I think that the chunkedEnumeratee is not consuming the final newlines.

Here is a dump from wireshark of the end of a chunked response and the start of the new one

    00003534  34 0d 0a 0d 0a 5d 7d 0d  0a                      4....]}. .
    0000353D  31 0d 0a 0a 0d 0a                                1.....
    00003543  30 0d 0a 0d 0a                                   0....
000013AC  47 45 54 20 2f 74 65 73  74 2f 5f 64 65 73 69 67 GET /tes t/_desig

We see three chunks. A chunk of length four consisting of the closing ]} from the JSON, a chunk of length 1 with a lf, and the closing chunk of length zero.

But look at the http-enumerator code

chunkedEnumeratee k@(Continue _) = do
    len <- catchParser "Chunk header" iterChunkHeader
    if len == 0
        then return k
        else do
            k' <- takeLBS len k
            catchParser "End of chunk newline" iterNewline
            chunkedEnumeratee k'

After seeing the final zero length chunk header, the enumeratee is not consuming the 0d 0a from the end of the chunk. Thus when the same manager is used for the next query, the 0d 0a are still sitting out there and we get the error about parsing the HTTP status line.

I am pretty sure the return k needs to change to

catchParser "End of chunks newline" iterNewline >> return k
wuzzeb commented 13 years ago

Hmm, changing return k to catchParser "End" iterNewline >> return k does not fix it.

I changed the top of parseStatus to the following so I could see what the problem is.

parseStatus :: Parser Status
parseStatus = do
    b <- takeWhile $ (/=72)
    if S.null b
        then return ()
        else fail $ "Before status: " ++ concat (intersperse " " (map show $ S.unpack b))
    _ <- string "HTTP/"

And I now get the error (only when the previous query returned a chunked encoding)

ParseError {errorContexts = ["HTTP status line"], errorMessage = "Failed reading: Before status: 48 13 10 13 10"}

48 is decimal for 0x30 so it looks like chuckedEnumeratee did not consume the entire zero length chunk.

(By the way, I am now testing with the latest git checkout)

wuzzeb commented 13 years ago

Ok, the last comment that chunckedEnumeratee was not consuming the whole chunk led me to the answer. The iteratee that I am using to read was just (iterParser Aeson.json) from the aeson package. And when that parser reached the final '}' it was yielding without demanding any more input, so the chunckedEnumeratee was leaving input.

I fixed it with the following patch, having the enumeratee consume all input when the attached iteratee yields a value.

I am not sure if this is totally correct, perhaps the code should be adding all the skipped data to the Yield.

diff -r 5b5a944fd674 Network/HTTP/Enumerator.hs
--- a/Network/HTTP/Enumerator.hs    Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator.hs    Thu Jun 09 14:43:56 2011 -0500
@@ -337,11 +337,19 @@
 chunkedEnumeratee k@(Continue _) = do
     len <- catchParser "Chunk header" iterChunkHeader
     if len == 0
-        then return k
+        then do catchParser "End of chunks newline" iterNewline 
+                return k
         else do
             k' <- takeLBS len k
             catchParser "End of chunk newline" iterNewline
             chunkedEnumeratee k'
+chunkedEnumeratee y@(Yield _ _) = loop where
+    loop = do len <- catchParser "Chunk header" iterChunkHeader
+              if len == 0
+                then do catchParser "End of chunks newline" iterNewline
+                        return y
+                else do catchParser "Skipping chunk" (iterSkipChunk len)
+                        loop
 chunkedEnumeratee step = return step

 takeLBS :: MonadIO m => Int -> Enumeratee S.ByteString S.ByteString m a
diff -r 5b5a944fd674 Network/HTTP/Enumerator/HttpParser.hs
--- a/Network/HTTP/Enumerator/HttpParser.hs Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator/HttpParser.hs Thu Jun 09 14:43:56 2011 -0500
@@ -2,6 +2,7 @@
 module Network.HTTP.Enumerator.HttpParser
     ( iterHeaders
     , iterChunkHeader
+    , iterSkipChunk
     , iterNewline
     ) where

@@ -11,6 +12,7 @@
 import Data.Enumerator (Iteratee)
 import qualified Data.ByteString as S
 import qualified Data.ByteString.Char8 as S8
+import Data.List (intersperse)
 import Control.Applicative
 import Data.Word (Word8)

@@ -60,6 +62,10 @@

 parseStatus :: Parser Status
 parseStatus = do
+    b <- takeWhile $ (/=72)
+    if S.null b
+        then return ()
+        else fail $ "Before status: " ++ concat (intersperse " " (map show $ S.unpack b))
     _ <- string "HTTP/"
     ver <- takeWhile1 $ not . isSpace
     _ <- word8 32 -- space
@@ -86,6 +92,10 @@
 iterChunkHeader =
     iterParser (parseChunkHeader <?> "Chunked transfer encoding header")

+iterSkipChunk :: Monad m => Int -> Iteratee S.ByteString m ()
+iterSkipChunk i =
+    iterParser ((take i >> newline) <?> "Skipped chunk")
+
 iterNewline :: Monad m => Iteratee S.ByteString m ()
 iterNewline = iterParser newline
wuzzeb commented 13 years ago

I don't know how you want to fix this, but here is another option. Perhaps this is a better patch. Now there is no difference between if the server returns chunked encoding or not, in both cases the extra will be stuck onto the Yield.

diff -r 5b5a944fd674 Network/HTTP/Enumerator.hs
--- a/Network/HTTP/Enumerator.hs    Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator.hs    Thu Jun 09 15:41:13 2011 -0500
@@ -90,7 +90,7 @@
 import Data.Enumerator
     ( Iteratee (..), Stream (..), catchError, throwError
     , yield, Step (..), Enumeratee, ($$), joinI, Enumerator, run_
-    , returnI, (>==>)
+    , returnI, (>==>), (=$)
     )
 import qualified Data.Enumerator.List as EL
 import Network.HTTP.Enumerator.HttpParser
@@ -337,11 +337,26 @@
 chunkedEnumeratee k@(Continue _) = do
     len <- catchParser "Chunk header" iterChunkHeader
     if len == 0
-        then return k
+        then do catchParser "End of chunks newline" iterNewline 
+                return k
         else do
             k' <- takeLBS len k
             catchParser "End of chunk newline" iterNewline
             chunkedEnumeratee k'
+
+chunkedEnumeratee (Yield a s) = loop s where
+    loop s' = do len <- catchParser "Chunk header" iterChunkHeader
+                 if len == 0
+                   then do catchParser "End of chunks newline" iterNewline
+                           return $ Yield a s'
+                   else do extra <- takeLBS len =$ EL.consume
+                           catchParser "Skipping chunk" iterNewline
+                           loop $ comb s' extra
+
+    comb EOF         [] = EOF
+    comb EOF         ys = Chunks ys
+    comb (Chunks xs) ys = Chunks $ xs ++ ys
+
 chunkedEnumeratee step = return step

 takeLBS :: MonadIO m => Int -> Enumeratee S.ByteString S.ByteString m a
snoyberg commented 13 years ago

Thank you for the detailed information on this bug. It made me finally start the http-enumerator test suite, which I've been meaning to do for a while now :). I just added a new commit which should hopefully solve this bug; at the very least, my test passes. Can you test it on your use case and let me know?

wuzzeb commented 13 years ago

Yes, your fix works! All my quickcheck tests are passing. Hopefully in the next few days I can upload the package to hackage, although I will need to wait until you post a new http-enumerator so I can depend on the version with the fix.

snoyberg commented 13 years ago

Cool, new version released on Hackage.