Open zudov opened 6 years ago
Thanks for the report. I'm not going to have a chance to look into this yet. A repro that doesn't use xml-conduit
would be ideal.
I've been using conduit-1.2.10 until now, but would like to move to 1.3.1.1, but this bug is holding me back and I don't see how to avoid it. This simple code:
{-# OPTIONS_GHC -fno-full-laziness #-}
module Main where
import Data.Conduit (runConduitRes, (.|))
import qualified Data.Conduit.Binary as C
import qualified Data.Conduit.List as C
import System.Directory (getDirectoryContents)
import System.Environment (getArgs)
import qualified Text.XML.Stream.Parse as X
parseXMLFile :: FilePath -> IO ()
parseXMLFile f = do
_ <- runConduitRes (C.sourceFile f
.| X.parseBytes X.def
.| C.sinkNull)
return ()
main :: IO ()
main = do
files <- getArgs
mapM_ parseXMLFile files
run with
stack exec ghc -- --make -rtsopts X.hs -o X
./X /tmp/f.xml +RTS -hT -sstderr
never even reaches 1MB memory usage on conduit-1.2.10 / xml-conduit-1.5.1, while with conduit-1.3.1 / xml-conduit-1.8.0.1 it goes up to 100MB when I pass it a 23MB xml file. If I pass the same file twice it of course makes sense to retain the result value, but the result should not take 100MB :)
The -fno-full-laziness
suggested at https://www.well-typed.com/blog/2016/09/sharing-conduit/ and https://github.com/michaelt/streaming/issues/6 seems to have no effect (which makes me think it's not the same as https://gitlab.haskell.org/ghc/ghc/issues/9520 where -fno-full-laziness
was a solution). Compiling with -prof -fprof-auto
makes the leak go away, but of course runs very slowly.
I can avoid the leak with e.g. explicitly parseXML "/tmp/f.xml"
, but as soon as it seems like there's any chance files
might contain the same file twice, ghc starts retaining the memory (and not just of the result value, but all intermediate values it seems). And unfortunately ghc isn't smart enough to notice it when I put nub
in front of files
…
@unhammer, have you tried applying -fno-full-laziness
to xml-conduit
(by specifying it in cabal/stack file)?
No, I was just applying it to my local project …
@zudov Thank you so much, that fixed it =D It now runs in nearly constant memory even in our big pipeline with zero other changes than
diff --git a/stack.yaml b/stack.yaml
index abeb271b..f9f511ac 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -49,6 +49,8 @@
+ghc-options:
+ xml-conduit: -fno-full-laziness
@unhammer happy to hear this fixes it for you! (credit goes to folks at Well-Typed for the suggestion for -fno-full-laziness
on xml-conduit
)
However since the original report we switched to xeno
(in the hope it would fix it), and turns out we still have this problem even without xml-conduit
. The general pattern seems to be that GHC is not able to figure out (anymore, since it was able on conduit-1.2
) that it shouldn't hold on resources when conditionally doing effects inside a conduit.
We're now trying to have a small replication without xml-conduit
, or maybe we'll switch back to the xml-conduit
version of our code and apply the workaround
@f-f are you using xeno with conduit? I thought there wasn't any conduit-support for xeno yet. Or are you saying the leak is because of a newer GHC?
Here's an interesting space leak that we have discovered after switching to
lts-11
(andconduit-1.3
).Background
We have a sync job that requests a big XML file over HTTP, parses it into a stream of customer records and then processes them. The processing involves database calls and in some circumstances (when customer record lacks data or are duplicated) HTTP calls with more XML parsing.
Reproducing it
I've made this example that only depends on
conduit
andxml-conduit
.Changes inxml-conduit
don't appear very dangerous after a brief look. I suspect that it's reproducible withoutxml-conduit
, but I might be wrong.For ease of reproduction the
input
here is generated in-place, it can as well be read from network or handle. The input is a stream looking like<ID>0</ID><ID>1</ID><ID>2</ID>...
, it gets parsed into a stream of"0"
,"1"
,"2"
. This is then processed bymapM_
that contains another reference toXML.parseBytes
.Compiling the binary against
conduit-1.3
and running it with+RTS -s -p
gives the following output:The culpit of the leak is in the fact that there's "outer" and "inner" reference to
XML.parseBytes
. The inner reference never gets executed, yet the mere presence of it triggers the issue. Producing the input without xml parsing (Conduit.iterate succ 0 .| Conduit.map show
) fixes the issue. Replacing the inner conduit with justrunConduit $ pure () .| sinkNull
orerror "IMPOSSIBLE HAPPENED"
fixes the issue as well.Trying conduit-1.2
Here's the output of the same code built against
conduit-1.2
Trying to profile
Building it against
conduit-1.3
WITH profiling, makes the problem go away and gives this output (take note of run timing, it's much slower):I haven't yet managed to build the thing with
-prof
, but without havingSCC
s everywhere (since stack is forcing-auto-all
and-auto-cafs
)Attempts to find a fix
Here is a incomplete and fuzzy list of things that I've tried (in combinations as well) and that DIDN't help with the issue:
-O0
-fno-full-laziness
-fno-state-hack
NOINLINE
pragmas, passing dummy arguments to avoid unwantedCAF
sThe biggest suspect that I had is unwanted conduit sharing in CAFs, but so far I couldn't prove that this is the case.
The second big suspicion are the rewrite rules. I haven't yet tried to debug them (e.g. looking at what rules are firing, etc). I've inspected the Core output to ensure that no sharing is happening, but I didn't inspect the changes that happen by enabling/disabling rewrite rules.
The next logical step would be to get reproduce it without
xml-conduit
which would allow to have much better grasp at what's going on.