Closed twhitehead closed 3 years ago
Actually, now that I'm thinking about this again, maybe it could work fine the way it is if <|>
was set to inline in addition to return [] = pure []
pure [] = Parser $ \t !pos more _lose succ -> succ t pos more []
This way the closure in <|>
let lose' t' _pos' more' _ctx _msg = runParser g t' pos more' lose succ
would get reduced to
let lose' t' _pos' more' _ctx _msg = succ t' pos more []
which doesn't hold onto the prior backtrack chain lose
.
NOTE: Edited for
<|>
isn't forced to inline@twhitehead, thanks for you thoughtful analysis. Have you had a chance to construct a program exhibiting this leak yet? It would be a good to have a concrete example so we can easily characterise possible solutions.
On the whole I greatly prefer the latter solution than the former. I think you will find that changing the representation of Parser
as you suggest will significantly regress things without considerable work.
I've come up with the following skipping version of sepBy1
skipSepBy1 p s = scan
where scan = p *> ((s *> scan) <|> pure ())
If I manually inline <|>
and pure
I get
skipSepBy1Inlined p s = scan
where scan = p *> ( Parser $ \t pos more lose succ ->
let lose' t' _pos' more' _ctx _msg = runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
in runParser (s *> scan) t pos more lose' succ )
I would hope that GHC would collapse runParser (Parser ...)
and the following lambda binding as follows to get
runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
= (\t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
= succ t' pos more' ()
which gives you
skipSepBy1InlinedCollapsed p s = scan
where scan = p *> ( Parser $ \t pos more lose succ ->
let lose' t' _pos' more' _ctx _msg = succ t' pos more' []
in runParser (s *> scan) t pos more lose' succ )
If I run this last version for a bit using Text.Lazy
input with memory profiling on (compile with -prof -fprof-auto
and run with +RTS -h -L500
)
main = do
let parser = skipSepBy1InlineCollapsed (skipWhile (/= ',')) (char ',')
input = fromChunks let chunk = "foo," in Chunk chunk input
print $ parse parser input
I see, in the .hp
profile output file that the only steadily increasing memory consumption is coming from endlessly expanding the underlying text buffer (all other allocation sizes are quite small and stable)
BEGIN_SAMPLE 10.012260
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
...
END_SAMPLE 10.012260
So that works as you would hope.
Switching to the non-hand-inlined version
main = do
let parser = skipSepBy1 (skipWhile (/= ',')) (char ',')
input = fromChunks let chunk = "foo," in Chunk chunk input
print $ parse parser input
reveals additional growing allocations, the largest of which is from the skipSepBy1
routine
BEGIN_SAMPLE 10.092646
...
(422)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
(420)prompt/main.parser/parse/main/Main.CAF 274713680
(410)skipSepBy1.scan/skipSepBy1/main.parser/parse/main/Main.CAF 686784272
...
END_SAMPLE 10.092646
Forcing <|>
to inline doesn't, unfrotunately, give the expected gains.
Looking at the -ddump-simpl
seems to reveal that GHC is not collapsing runParser (Parser ...) ...
and the following lambda. This can be seen too by running against just the hand inlined, but not collapsed, version
main = do
let parser = skipSepBy1Inline (skipWhile (/= ',')) (char ',')
input = fromChunks let chunk = "foo," in Chunk chunk input
print $ parse parser input
The profiling reveals we are still suffering from a large growing allocation in our skipSepBy1Inlined
routine
BEGIN_SAMPLE 10.052888
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
(422)prompt/main.parser/parse/main/Main.CAF 264655360
(415)skipSepBy1Inlined.scan.\/skipSepBy1Inlined.scan/skipSepBy1Inlined/main.parser/parse/main/Main.CAF 529326368
...
END_SAMPLE 10.052888
Looking at the -ddump-simpl
reveals that the issue seems to be collapsing runParser (Parser ...) ...
. If I do this by hand, it will then collapse the lambda and realize that loose
is never used
skipSepBy1InlinedCollapsed' p s = scan
where scan = p *> ( Parser $ \t pos more lose succ ->
let lose' t' _pos' more' _ctx _msg = (\t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
in runParser (s *> scan) t pos more lose' succ )
as can also seen by the profiling results
BEGIN_SAMPLE 10.025809
...
(424)pappend/prompt/main.parser/parse/main/Main.CAF 201326600
END_SAMPLE 10.025809
Perhaps this means the desired result requires both the inlining and a rewrite rule to change runParser (Parser parser)
into parser
.
I tried with
{-# RULES "runParser/Parser" forall (p::forall i r. State i -> Pos -> More -> Failure i (State i) r -> Success i (State i) a r -> IResult i r). runParser (Parser p) = p #-}
but the compiler isn't happy
Illegal expression: runParser
in left-hand side: runParser (Parser p)
LHS must be of form (f e1 .. en) where f is not forall'd
I also tried a creating the wrapping function fooParser = runParser
set with NOINLINE
and put the rule on that and used it in the sepBy1Inlined
. The compiler accepts this (i.e., the above error goes away), but it doesn't fire.
The -ddump-simpl
output contains a bunch of cast stuff that I don't understand (almost looks like a cast
on id $
) that appears to have replaced the inner Parser
call.
(\ (t'_a1NG :: State i_a3o9)
_ [Occ=Dead]
(more'_a1NI :: More)
_ [Occ=Dead]
_ [Occ=Dead] ->
fooParser
@ i_a3o9
@ ()
($ @ 'GHC.Types.LiftedRep
@ (forall r1.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r1
-> Success i_a3o9 (State i_a3o9) () r1
-> IResult i_a3o9 r1)
@ (Parser i_a3o9 ())
((\ (v_B1
:: forall r1.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r1
-> Success i_a3o9 (State i_a3o9) () r1
-> IResult i_a3o9 r1) ->
v_B1)
`cast` (<forall r1.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r1
-> Success i_a3o9 (State i_a3o9) () r1
-> IResult i_a3o9 r1>_R
->_R Sym (Data.Attoparsec.Internal.Types.N:Parser[0]
<i_a3o9>_N <()>_R)
:: ((forall r1.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r1
-> Success i_a3o9 (State i_a3o9) () r1
-> IResult i_a3o9 r1)
-> forall r.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r
-> Success i_a3o9 (State i_a3o9) () r
-> IResult i_a3o9 r)
~R# ((forall r1.
State i_a3o9
-> Pos
-> More
-> Failure i_a3o9 (State i_a3o9) r1
-> Success i_a3o9 (State i_a3o9) () r1
-> IResult i_a3o9 r1)
-> Parser i_a3o9 ())))
(\ (@ r1_a3db)
(t1_a1NL :: State i_a3o9)
(pos1_a1NM :: Pos)
(more1_a1NN :: More)
_ [Occ=Dead]
(succ1_a1NP :: Success i_a3o9 (State i_a3o9) () r1_a3db) ->
case pos1_a1NM
`cast` (Data.Attoparsec.Internal.Types.N:Pos[0]
:: Pos ~R# Int)
of nt_s44k
{ GHC.Types.I# ipv_s44m ->
succ1_a1NP
t1_a1NL
(nt_s44k
`cast` (Sym (Data.Attoparsec.Internal.Types.N:Pos[0])
:: Int ~R# Pos))
more1_a1NN
GHC.Tuple.()
}))
@ r_a3cM
t'_a1NG
pos_a1NB
more'_a1NI
lose_a1ND
succ_a1NE)
Possibly this is why the rule isn't firing? I'm going to ask on Haskell cafe to see if someone who knows more than me can weigh in.
For someone just wanting to play with the code showing the issue, here it is
{-# LANGUAGE OverloadedStrings, BangPatterns #-}
module Main where
import Data.Text
import Data.Text.Lazy hiding (Text,repeat)
import Data.Attoparsec.Text.Lazy hiding (Parser)
import Data.Attoparsec.Internal.Types
skipSepBy1Inlined :: Parser Text a -> Parser Text b -> Parser Text ()
skipSepBy1Inlined p s = scan
where scan = p *> ( Parser $ \t pos more lose succ ->
let lose' t' _pos' more' _ctx _msg = runParser (Parser $ \t !pos more _lose succ -> succ t pos more ()) t' pos more' lose succ
in runParser (s *> scan) t pos more lose' succ )
main :: IO ()
main = do
let parser = skipSepBy1Inlined (skipWhile (/= ',')) (char ',')
input = fromChunks $ repeat "foo,"
print $ parse parser input
You can compare the heap performance and -ddump-simpl
from this to what you get when you manually collapse Parser (Parser $ parser)
to parser
in skipSepBy1Inlined
.
I spotted your mail on haskell-cafe and got curious. But I can’t reproduce the problem. With GHC-8.8.4, attoparsec-0.14.1, if I compile that file of yours with
ghc -O2 --make test.hs -rtsopts -ddump-simpl -dsuppress-unfoldings -dsuppress-all -fforce-recomp
I see
-- RHS size: {terms: 65, types: 94, coercions: 14, joins: 0/2}
$wskipSepBy1Inlined
= \ @ a_s6jC
@ b_s6jD
w_s6jE
w1_s6jF
@ r_s6jG
w2_s6jH
ww_s6jO
w3_s6jJ
w4_s6jK
w5_s6jL ->
letrec {
$wscan_s6jB
= \ @ r1_s6jr w6_s6js ww1_s6jz w7_s6ju w8_s6jv w9_s6jw ->
(w_s6jE `cast` <Co:3>)
w6_s6js
((I# ww1_s6jz) `cast` <Co:2>)
w7_s6ju
w8_s6jv
(\ t'_a4yz pos'_a4yA more'_a4yB _ ->
case pos'_a4yA `cast` <Co:1> of nt1_a4yD { I# ipv1_a4yF ->
let {
lose_s66e
= \ t'1_a1Lb _ more'1_a1Ld _ _ ->
w9_s6jw t'1_a1Lb (nt1_a4yD `cast` <Co:2>) more'1_a1Ld () } in
(w1_s6jF `cast` <Co:3>)
t'_a4yz
(nt1_a4yD `cast` <Co:2>)
more'_a4yB
lose_s66e
(\ t'1_X4zm pos'1_X4zo more'1_X4zq _ ->
case pos'1_X4zo `cast` <Co:1> of { I# ww3_X6kH ->
$wscan_s6jB t'1_X4zm ww3_X6kH more'1_X4zq lose_s66e w9_s6jw
})
}); } in
$wscan_s6jB w2_s6jH ww_s6jO w3_s6jJ w4_s6jK w5_s6jL
which looks pretty good to me. In particular, lose_s66e
does not hold on to any failure continuation. GHC-8.10 also looks good. But maybe I am missing something?
@nomeata No. I don't think you are missing anything. I've spent sometime now trying to reproduce what I was getting originally, but with no luck. I'm getting the same as you now. I don't know how come I was getting different results before.
Thank you very much for responding back from the list and looking into this. Sorry it has somehow become non-reproducible. I'm think there must have been something else with my setup that I missed specifying and now am not reproducing.
I will close the ticket. If I can figure out how to get it back again I'll reopen it then. Thanks again very much! :+1:
Maybe you forgot -O2
? It's easily forgotten
EDIT: You probably just want to skip straight to the second comment (this was suggesting make the datatype more complex when simple inlining should work instead)...
While writing a parser, I got looking through the attoparsec code. I believe operators like
sepBy
likely space leak.The issue being that the recursion occurs on the LHS of
<|>
, which means backtrack information is created and held onto each time you recurse despite the fact all but the last will never be used.You can see this allocation explicitly in the
(<|>) = plus
implementationEach
<|>
allocates anotherlose'
closure that ties this backtrack to the previous backtrace, creating the string of backtrack closures. As a general rule, I guess one wants to do recursion on the RHS of<|>
, as, once you reach the RHS, the LHS is fully discarded so you don't get any space leaks. This isn't always possible though.What I think could be done, is the top level
Parser i a
type could be split into two casesin order that the closures of parsers that always succeed don't hold onto the backtrace trail inside themselves. It also means this information is exposed via inlining.
That is, this would change the definition of
return = pure
toand
(<|>) = plus
toThis way, when
... <|> return []
would appear insepBy
with this version, the chain of backtracking would be broken as a closure would not be being created that would hold onto it needlessly. I would also hope the inlining would automatically eat most of the case handling.Anyway, that is my thoughts from reading the code. I haven't actually tried it out, but I figured it would be worth mentioning to see what people think.