haskell / attoparsec

A fast Haskell library for parsing ByteStrings
http://hackage.haskell.org/package/attoparsec
Other
514 stars 93 forks source link

Potential space leak #186

Closed twhitehead closed 3 years ago

twhitehead commented 3 years ago

EDIT: You probably just want to skip straight to the second comment (this was suggesting make the datatype more complex when simple inlining should work instead)...

While writing a parser, I got looking through the attoparsec code. I believe operators like sepBy likely space leak.

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

sepBy1 p s = scan
  where scan = liftA2 (:) p ((s *> scan) <|> pure [])

The issue being that the recursion occurs on the LHS of <|>, which means backtrack information is created and held onto each time you recurse despite the fact all but the last will never be used.

You can see this allocation explicitly in the (<|>) = plus implementation

plus f g = Parser $ \t pos more lose succ ->
  let lose' t' _pos' more' _ctx _msg = runParser g t' pos more' lose succ
  in runParser f t pos more lose' succ

Each <|> allocates another lose' closure that ties this backtrack to the previous backtrace, creating the string of backtrack closures. As a general rule, I guess one wants to do recursion on the RHS of <|>, as, once you reach the RHS, the LHS is fully discarded so you don't get any space leaks. This isn't always possible though.

What I think could be done, is the top level Parser i a type could be split into two cases

newtype Parser i a
  = ParserBacktrack {
       runParser :: forall r. State i -> Pos -> More
                                      -> Failure i (State i)   r
                                      -> Success i (State i) a r
                                      -> IResult i r }
  | ParserSuccess {
       runParser :: forall r. State i -> Pos -> More
                                      -> Success i (State i) a r
                                      -> IResult i r }

in order that the closures of parsers that always succeed don't hold onto the backtrace trail inside themselves. It also means this information is exposed via inlining.

That is, this would change the definition of return = pure to

pure v = ParserSuccess $ \t !pos more succ -> succ t pos more v
{-# INLINE pure #-}

and (<|>) = plus to

plus :: Parser i a -> Parser i a -> Parser i a
plus (ParserSuccess   f) _                   = ParserSuccess f
plus (ParserBacktrack f) (ParserSuccess   g) = ParserSuccess $ \t pos more succ ->
  let lose' t' _pos' more' _ctx _msg = g t' pos more' succ
  in f t pos more lose' succ
plus (ParserBacktrack f) (ParserBacktrack g) = ParserBacktrack $ \t pos more lose succ ->
  let lose' t' _pos' more' _ctx _msg = g t' pos more' lose succ
  in f t pos more succ lose' succ

This way, when ... <|> return [] would appear in sepBy with this version, the chain of backtracking would be broken as a closure would not be being created that would hold onto it needlessly. I would also hope the inlining would automatically eat most of the case handling.

Anyway, that is my thoughts from reading the code. I haven't actually tried it out, but I figured it would be worth mentioning to see what people think.

twhitehead commented 3 years ago

Actually, now that I'm thinking about this again, maybe it could work fine the way it is if <|> was set to inline in addition to return [] = pure []

pure [] = Parser $ \t !pos more _lose succ -> succ t pos more []

This way the closure in <|>

let lose' t' _pos' more' _ctx _msg = runParser g t' pos more' lose succ

would get reduced to

let lose' t' _pos' more' _ctx _msg = succ t' pos more []

which doesn't hold onto the prior backtrack chain lose.

NOTE: Edited for

bgamari commented 3 years ago

@twhitehead, thanks for you thoughtful analysis. Have you had a chance to construct a program exhibiting this leak yet? It would be a good to have a concrete example so we can easily characterise possible solutions.

On the whole I greatly prefer the latter solution than the former. I think you will find that changing the representation of Parser as you suggest will significantly regress things without considerable work.

twhitehead commented 3 years ago

Test case

I've come up with the following skipping version of sepBy1

skipSepBy1 p s = scan 
    where scan = p *> ((s *> scan) <|> pure ())

Desired inlined behaviour

If I manually inline <|> and pure I get

skipSepBy1Inlined p s = scan
    where scan = p *> ( Parser $ \t pos more lose succ ->
                          let lose' t' _pos' more' _ctx _msg = runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
                          in runParser (s *> scan) t pos more lose' succ )

I would hope that GHC would collapse runParser (Parser ...) and the following lambda binding as follows to get

runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
= (\t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
= succ t' pos more' ()

which gives you

skipSepBy1InlinedCollapsed p s = scan
    where scan = p *> ( Parser $ \t pos more lose succ ->
                          let lose' t' _pos' more' _ctx _msg = succ t' pos more' []
                          in runParser (s *> scan) t pos more lose' succ )

If I run this last version for a bit using Text.Lazy input with memory profiling on (compile with -prof -fprof-auto and run with +RTS -h -L500)

main = do
  let parser = skipSepBy1InlineCollapsed (skipWhile (/= ',')) (char ',')
      input = fromChunks let chunk = "foo," in Chunk chunk input
  print $ parse parser input

I see, in the .hp profile output file that the only steadily increasing memory consumption is coming from endlessly expanding the underlying text buffer (all other allocation sizes are quite small and stable)

BEGIN_SAMPLE 10.012260
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
...
END_SAMPLE 10.012260

So that works as you would hope.

Current behaviour

Switching to the non-hand-inlined version

main = do
  let parser = skipSepBy1 (skipWhile (/= ',')) (char ',')
      input = fromChunks let chunk = "foo," in Chunk chunk input
  print $ parse parser input

reveals additional growing allocations, the largest of which is from the skipSepBy1 routine

BEGIN_SAMPLE 10.092646
...
(422)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
(420)prompt/main.parser/parse/main/Main.CAF 274713680
(410)skipSepBy1.scan/skipSepBy1/main.parser/parse/main/Main.CAF 686784272
...
END_SAMPLE 10.092646

Actual inlined behaviour

Forcing <|> to inline doesn't, unfrotunately, give the expected gains.

Looking at the -ddump-simpl seems to reveal that GHC is not collapsing runParser (Parser ...) ... and the following lambda. This can be seen too by running against just the hand inlined, but not collapsed, version

main = do
  let parser = skipSepBy1Inline (skipWhile (/= ',')) (char ',')
      input = fromChunks let chunk = "foo," in Chunk chunk input
  print $ parse parser input

The profiling reveals we are still suffering from a large growing allocation in our skipSepBy1Inlined routine

BEGIN_SAMPLE 10.052888
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
(422)prompt/main.parser/parse/main/Main.CAF 264655360
(415)skipSepBy1Inlined.scan.\/skipSepBy1Inlined.scan/skipSepBy1Inlined/main.parser/parse/main/Main.CAF  529326368
...
END_SAMPLE 10.052888

Further thoughts

Looking at the -ddump-simpl reveals that the issue seems to be collapsing runParser (Parser ...) .... If I do this by hand, it will then collapse the lambda and realize that loose is never used

skipSepBy1InlinedCollapsed' p s = scan
    where scan = p *> ( Parser $ \t pos more lose succ ->
                          let lose' t' _pos' more' _ctx _msg = (\t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
                          in runParser (s *> scan) t pos more lose' succ )

as can also seen by the profiling results

BEGIN_SAMPLE 10.025809
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
END_SAMPLE 10.025809

Perhaps this means the desired result requires both the inlining and a rewrite rule to change runParser (Parser parser) into parser.

twhitehead commented 3 years ago

I tried with

{-# RULES "runParser/Parser" forall (p::forall i r. State i -> Pos -> More -> Failure i (State i) r -> Success i (State i) a r -> IResult i r). runParser (Parser p) = p #-}

but the compiler isn't happy

      Illegal expression: runParser
      in left-hand side: runParser (Parser p)
    LHS must be of form (f e1 .. en) where f is not forall'd

I also tried a creating the wrapping function fooParser = runParser set with NOINLINE and put the rule on that and used it in the sepBy1Inlined. The compiler accepts this (i.e., the above error goes away), but it doesn't fire.

The -ddump-simpl output contains a bunch of cast stuff that I don't understand (almost looks like a cast on id $) that appears to have replaced the inner Parser call.

                      (\ (t'_a1NG :: State i_a3o9)
                         _ [Occ=Dead]
                         (more'_a1NI :: More)
                         _ [Occ=Dead]
                         _ [Occ=Dead] ->
                         fooParser
                           @ i_a3o9
                           @ ()
                           ($ @ 'GHC.Types.LiftedRep
                              @ (forall r1.
                                 State i_a3o9
                                 -> Pos
                                 -> More
                                 -> Failure i_a3o9 (State i_a3o9) r1
                                 -> Success i_a3o9 (State i_a3o9) () r1
                                 -> IResult i_a3o9 r1)
                              @ (Parser i_a3o9 ())
                              ((\ (v_B1
                                     :: forall r1.
                                        State i_a3o9
                                        -> Pos
                                        -> More
                                        -> Failure i_a3o9 (State i_a3o9) r1
                                        -> Success i_a3o9 (State i_a3o9) () r1
                                        -> IResult i_a3o9 r1) ->
                                  v_B1)
                               `cast` (<forall r1.
                                        State i_a3o9
                                        -> Pos
                                        -> More
                                        -> Failure i_a3o9 (State i_a3o9) r1
                                        -> Success i_a3o9 (State i_a3o9) () r1
                                        -> IResult i_a3o9 r1>_R
                                       ->_R Sym (Data.Attoparsec.Internal.Types.N:Parser[0]
                                                     <i_a3o9>_N <()>_R)
                                       :: ((forall r1.
                                            State i_a3o9
                                            -> Pos
                                            -> More
                                            -> Failure i_a3o9 (State i_a3o9) r1
                                            -> Success i_a3o9 (State i_a3o9) () r1
                                            -> IResult i_a3o9 r1)
                                           -> forall r.
                                              State i_a3o9
                                              -> Pos
                                              -> More
                                              -> Failure i_a3o9 (State i_a3o9) r
                                              -> Success i_a3o9 (State i_a3o9) () r
                                              -> IResult i_a3o9 r)
                                          ~R# ((forall r1.
                                                State i_a3o9
                                                -> Pos
                                                -> More
                                                -> Failure i_a3o9 (State i_a3o9) r1
                                                -> Success i_a3o9 (State i_a3o9) () r1
                                                -> IResult i_a3o9 r1)
                                               -> Parser i_a3o9 ())))
                              (\ (@ r1_a3db)
                                 (t1_a1NL :: State i_a3o9)
                                 (pos1_a1NM :: Pos)
                                 (more1_a1NN :: More)
                                 _ [Occ=Dead]
                                 (succ1_a1NP :: Success i_a3o9 (State i_a3o9) () r1_a3db) ->
                                 case pos1_a1NM
                                      `cast` (Data.Attoparsec.Internal.Types.N:Pos[0]
                                              :: Pos ~R# Int)
                                 of nt_s44k
                                 { GHC.Types.I# ipv_s44m ->
                                 succ1_a1NP
                                   t1_a1NL
                                   (nt_s44k
                                    `cast` (Sym (Data.Attoparsec.Internal.Types.N:Pos[0])
                                            :: Int ~R# Pos))
                                   more1_a1NN
                                   GHC.Tuple.()
                                 }))
                           @ r_a3cM
                           t'_a1NG
                           pos_a1NB
                           more'_a1NI
                           lose_a1ND
                           succ_a1NE)

Possibly this is why the rule isn't firing? I'm going to ask on Haskell cafe to see if someone who knows more than me can weigh in.

twhitehead commented 3 years ago

For someone just wanting to play with the code showing the issue, here it is

{-# LANGUAGE OverloadedStrings, BangPatterns #-}

module Main where

import Data.Text
import Data.Text.Lazy hiding (Text,repeat)
import Data.Attoparsec.Text.Lazy hiding (Parser)
import Data.Attoparsec.Internal.Types

skipSepBy1Inlined :: Parser Text a -> Parser Text b -> Parser Text ()
skipSepBy1Inlined p s = scan
    where scan = p *> ( Parser $ \t pos more lose succ ->
                          let lose' t' _pos' more' _ctx _msg = runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
                          in runParser (s *> scan) t pos more lose' succ )

main :: IO ()
main = do
  let parser = skipSepBy1Inlined (skipWhile (/= ',')) (char ',')
      input = fromChunks $ repeat "foo,"
  print $ parse parser input

You can compare the heap performance and -ddump-simpl from this to what you get when you manually collapse Parser (Parser $ parser) to parser in skipSepBy1Inlined.

nomeata commented 3 years ago

I spotted your mail on haskell-cafe and got curious. But I can’t reproduce the problem. With GHC-8.8.4, attoparsec-0.14.1, if I compile that file of yours with

ghc -O2 --make test.hs -rtsopts -ddump-simpl -dsuppress-unfoldings -dsuppress-all -fforce-recomp

I see

-- RHS size: {terms: 65, types: 94, coercions: 14, joins: 0/2}
$wskipSepBy1Inlined
  = \ @ a_s6jC
      @ b_s6jD
      w_s6jE
      w1_s6jF
      @ r_s6jG
      w2_s6jH
      ww_s6jO
      w3_s6jJ
      w4_s6jK
      w5_s6jL ->
      letrec {
        $wscan_s6jB
          = \ @ r1_s6jr w6_s6js ww1_s6jz w7_s6ju w8_s6jv w9_s6jw ->
              (w_s6jE `cast` <Co:3>)
                w6_s6js
                ((I# ww1_s6jz) `cast` <Co:2>)
                w7_s6ju
                w8_s6jv
                (\ t'_a4yz pos'_a4yA more'_a4yB _ ->
                   case pos'_a4yA `cast` <Co:1> of nt1_a4yD { I# ipv1_a4yF ->
                   let {
                     lose_s66e
                       = \ t'1_a1Lb _ more'1_a1Ld _ _ ->
                           w9_s6jw t'1_a1Lb (nt1_a4yD `cast` <Co:2>) more'1_a1Ld () } in
                   (w1_s6jF `cast` <Co:3>)
                     t'_a4yz
                     (nt1_a4yD `cast` <Co:2>)
                     more'_a4yB
                     lose_s66e
                     (\ t'1_X4zm pos'1_X4zo more'1_X4zq _ ->
                        case pos'1_X4zo `cast` <Co:1> of { I# ww3_X6kH ->
                        $wscan_s6jB t'1_X4zm ww3_X6kH more'1_X4zq lose_s66e w9_s6jw
                        })
                   }); } in
      $wscan_s6jB w2_s6jH ww_s6jO w3_s6jJ w4_s6jK w5_s6jL

which looks pretty good to me. In particular, lose_s66e does not hold on to any failure continuation. GHC-8.10 also looks good. But maybe I am missing something?

twhitehead commented 3 years ago

@nomeata No. I don't think you are missing anything. I've spent sometime now trying to reproduce what I was getting originally, but with no luck. I'm getting the same as you now. I don't know how come I was getting different results before.

Thank you very much for responding back from the list and looking into this. Sorry it has somehow become non-reproducible. I'm think there must have been something else with my setup that I missed specifying and now am not reproducing.

I will close the ticket. If I can figure out how to get it back again I'll reopen it then. Thanks again very much! :+1:

nomeata commented 3 years ago

Maybe you forgot -O2? It's easily forgotten