Closed wuzzeb closed 13 years ago
Hmm, changing return k to catchParser "End" iterNewline >> return k does not fix it.
I changed the top of parseStatus to the following so I could see what the problem is.
parseStatus :: Parser Status
parseStatus = do
b <- takeWhile $ (/=72)
if S.null b
then return ()
else fail $ "Before status: " ++ concat (intersperse " " (map show $ S.unpack b))
_ <- string "HTTP/"
And I now get the error (only when the previous query returned a chunked encoding)
ParseError {errorContexts = ["HTTP status line"], errorMessage = "Failed reading: Before status: 48 13 10 13 10"}
48 is decimal for 0x30 so it looks like chuckedEnumeratee did not consume the entire zero length chunk.
(By the way, I am now testing with the latest git checkout)
Ok, the last comment that chunckedEnumeratee was not consuming the whole chunk led me to the answer. The iteratee that I am using to read was just (iterParser Aeson.json) from the aeson package. And when that parser reached the final '}' it was yielding without demanding any more input, so the chunckedEnumeratee was leaving input.
I fixed it with the following patch, having the enumeratee consume all input when the attached iteratee yields a value.
I am not sure if this is totally correct, perhaps the code should be adding all the skipped data to the Yield.
diff -r 5b5a944fd674 Network/HTTP/Enumerator.hs
--- a/Network/HTTP/Enumerator.hs Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator.hs Thu Jun 09 14:43:56 2011 -0500
@@ -337,11 +337,19 @@
chunkedEnumeratee k@(Continue _) = do
len <- catchParser "Chunk header" iterChunkHeader
if len == 0
- then return k
+ then do catchParser "End of chunks newline" iterNewline
+ return k
else do
k' <- takeLBS len k
catchParser "End of chunk newline" iterNewline
chunkedEnumeratee k'
+chunkedEnumeratee y@(Yield _ _) = loop where
+ loop = do len <- catchParser "Chunk header" iterChunkHeader
+ if len == 0
+ then do catchParser "End of chunks newline" iterNewline
+ return y
+ else do catchParser "Skipping chunk" (iterSkipChunk len)
+ loop
chunkedEnumeratee step = return step
takeLBS :: MonadIO m => Int -> Enumeratee S.ByteString S.ByteString m a
diff -r 5b5a944fd674 Network/HTTP/Enumerator/HttpParser.hs
--- a/Network/HTTP/Enumerator/HttpParser.hs Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator/HttpParser.hs Thu Jun 09 14:43:56 2011 -0500
@@ -2,6 +2,7 @@
module Network.HTTP.Enumerator.HttpParser
( iterHeaders
, iterChunkHeader
+ , iterSkipChunk
, iterNewline
) where
@@ -11,6 +12,7 @@
import Data.Enumerator (Iteratee)
import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8
+import Data.List (intersperse)
import Control.Applicative
import Data.Word (Word8)
@@ -60,6 +62,10 @@
parseStatus :: Parser Status
parseStatus = do
+ b <- takeWhile $ (/=72)
+ if S.null b
+ then return ()
+ else fail $ "Before status: " ++ concat (intersperse " " (map show $ S.unpack b))
_ <- string "HTTP/"
ver <- takeWhile1 $ not . isSpace
_ <- word8 32 -- space
@@ -86,6 +92,10 @@
iterChunkHeader =
iterParser (parseChunkHeader <?> "Chunked transfer encoding header")
+iterSkipChunk :: Monad m => Int -> Iteratee S.ByteString m ()
+iterSkipChunk i =
+ iterParser ((take i >> newline) <?> "Skipped chunk")
+
iterNewline :: Monad m => Iteratee S.ByteString m ()
iterNewline = iterParser newline
I don't know how you want to fix this, but here is another option. Perhaps this is a better patch. Now there is no difference between if the server returns chunked encoding or not, in both cases the extra will be stuck onto the Yield.
diff -r 5b5a944fd674 Network/HTTP/Enumerator.hs
--- a/Network/HTTP/Enumerator.hs Thu Jun 09 16:31:48 2011 +0300
+++ b/Network/HTTP/Enumerator.hs Thu Jun 09 15:41:13 2011 -0500
@@ -90,7 +90,7 @@
import Data.Enumerator
( Iteratee (..), Stream (..), catchError, throwError
, yield, Step (..), Enumeratee, ($$), joinI, Enumerator, run_
- , returnI, (>==>)
+ , returnI, (>==>), (=$)
)
import qualified Data.Enumerator.List as EL
import Network.HTTP.Enumerator.HttpParser
@@ -337,11 +337,26 @@
chunkedEnumeratee k@(Continue _) = do
len <- catchParser "Chunk header" iterChunkHeader
if len == 0
- then return k
+ then do catchParser "End of chunks newline" iterNewline
+ return k
else do
k' <- takeLBS len k
catchParser "End of chunk newline" iterNewline
chunkedEnumeratee k'
+
+chunkedEnumeratee (Yield a s) = loop s where
+ loop s' = do len <- catchParser "Chunk header" iterChunkHeader
+ if len == 0
+ then do catchParser "End of chunks newline" iterNewline
+ return $ Yield a s'
+ else do extra <- takeLBS len =$ EL.consume
+ catchParser "Skipping chunk" iterNewline
+ loop $ comb s' extra
+
+ comb EOF [] = EOF
+ comb EOF ys = Chunks ys
+ comb (Chunks xs) ys = Chunks $ xs ++ ys
+
chunkedEnumeratee step = return step
takeLBS :: MonadIO m => Int -> Enumeratee S.ByteString S.ByteString m a
Thank you for the detailed information on this bug. It made me finally start the http-enumerator test suite, which I've been meaning to do for a while now :). I just added a new commit which should hopefully solve this bug; at the very least, my test passes. Can you test it on your use case and let me know?
Yes, your fix works! All my quickcheck tests are passing. Hopefully in the next few days I can upload the package to hackage, although I will need to wait until you post a new http-enumerator so I can depend on the version with the fix.
Cool, new version released on Hackage.
I am using http-enumerator to talk to couch db, and occasionally during testing I would get
After investigating using wireshark to look at the tcp dumps, I realized the error occured whenever the previous response was a chunked encoding. (I am reusing the same Manager for multiple queries.) After looking at the code, I think that the chunkedEnumeratee is not consuming the final newlines.
Here is a dump from wireshark of the end of a chunked response and the start of the new one
We see three chunks. A chunk of length four consisting of the closing ]} from the JSON, a chunk of length 1 with a lf, and the closing chunk of length zero.
But look at the http-enumerator code
After seeing the final zero length chunk header, the enumeratee is not consuming the 0d 0a from the end of the chunk. Thus when the same manager is used for the next query, the 0d 0a are still sitting out there and we get the error about parsing the HTTP status line.
I am pretty sure the return k needs to change to