Performance of linesUnbounded not linear wrt. line length

Oblosys commented 7 years ago

Because Data.Conduit.Combinators.splitOnUnboundedEC mappends each chunk and breaks on the result, linesUnbounded becomes inefficient when line length exceeds the chunk size. In case of strict byte strings or lists, this also has unnecessary memory costs.

The code below starts printing out numbers quickly, and gradually slows down until it reaches 300, after which it speeds up again.

slow :: IO ()
slow = do
  counterVar <- newIORef 0
  void . runResourceT $
    runConduit $     longLineSrc
                 =$= CL.mapM (\bs -> liftIO $ do
                                modifyIORef counterVar (+1)
                                i <- readIORef counterVar
                                putStrLn $ show i
                                return $ bs :: IO BS.ByteString)
                 =$= CC.linesUnboundedAscii
                 =$= CL.consume
    where longLineSrc :: ConduitM () BS.ByteString (ResourceT IO) ()
          longLineSrc = loop
             where loop = do
                     replicateM 300 $ yield chunk
                     yield $ BS.pack "\n"
                     loop
                   chunk = BS.replicate (100*1024) 'x'

snoyberg commented 7 years ago

I'm going to be mostly unavailable for the next week, but if you'd like to send a PR I'll be able to review more quickly.

On Fri, Sep 30, 2016, 3:09 PM Martijn Schrage notifications@github.com wrote:

Because Data.Conduit.Combinators.splitOnUnboundedEC mappends each chunk and breaks on the result, linesUnbounded becomes inefficient when line length exceeds the chunk size. In case of strict byte strings or lists, this also has unnecessary memory costs.

The code below starts printing out numbers quickly, and gradually slows down until it reaches 300, after which it speeds up again.

slow :: IO () slow = do counterVar <- newIORef 0 void . runResourceT $ runConduit $ longLineSrc =$= CL.mapM (\bs -> liftIO $ do modifyIORef counterVar (+1) i <- readIORef counterVar putStrLn $ show i return $ bs :: IO BS.ByteString) =$= CC.linesUnboundedAscii =$= CL.consume where longLineSrc :: ConduitM () BS.ByteString (ResourceT IO) () longLineSrc = loop where loop = do replicateM 300 $ yield chunk yield $ BS.pack "\n" loop chunk = BS.replicate (100*1024) 'x'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/snoyberg/mono-traversable/issues/110, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBB5WciqVmTZDJCs-vvhZ-XVT0Ahbmks5qvPvdgaJpZM4KLBPa .

Oblosys commented 7 years ago

There you go.

snoyberg commented 7 years ago

Very quick, and nice patch. Thank you!

Oblosys commented 7 years ago

That was a very fast merge, thanks!

snoyberg / mono-traversable

Performance of linesUnbounded not linear wrt. line length #110