snoyberg / conduit

A streaming data library
903 stars 196 forks source link

Conduit's internal Pipe is eating my RAM #370

Open zudov opened 6 years ago

zudov commented 6 years ago

Here's an interesting space leak that we have discovered after switching to lts-11 (and conduit-1.3).

Background

We have a sync job that requests a big XML file over HTTP, parses it into a stream of customer records and then processes them. The processing involves database calls and in some circumstances (when customer record lacks data or are duplicated) HTTP calls with more XML parsing.

Reproducing it

I've made this example that only depends on conduit and xml-conduit.Changes in xml-conduit don't appear very dangerous after a brief look. I suspect that it's reproducible without xml-conduit, but I might be wrong.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import           Prelude

import           Data.ByteString                 (ByteString)
import qualified Data.ByteString.Builder         as Builder
import           Data.Conduit                    as Conduit
import           Data.Conduit.ByteString.Builder as Conduit
import           Data.Conduit.List               as Conduit
import           Data.Monoid                     ((<>))

import qualified Text.XML.Stream.Parse           as XML

input :: Int -> ConduitM i ByteString IO ()
input n =
  Conduit.iterate succ 0
    .| Conduit.isolate n
    .| Conduit.map (\cusno -> "<ID>" <> Builder.intDec cusno <> "</ID>")
    .| Conduit.builderToByteString

main :: IO ()
main = runConduit $ do
  input 500000
    .| XML.parseBytes XML.def
    .| XML.manyYield (XML.tagNoAttr "ID" XML.content)
    .| Conduit.mapM_
         (\n -> if n == "NOT GONNA HAPPEN, EVER"
                then runConduit $ pure () .| XML.parseBytes .| sinkNull
                else pure ())
    .| sinkNull

For ease of reproduction the input here is generated in-place, it can as well be read from network or handle. The input is a stream looking like <ID>0</ID><ID>1</ID><ID>2</ID>..., it gets parsed into a stream of "0", "1", "2". This is then processed by mapM_ that contains another reference to XML.parseBytes.

Compiling the binary against conduit-1.3 and running it with +RTS -s -p gives the following output:

$ ./Leak-ghc-8.2.2-conduit-1.3.0.3-no-prof +RTS -s -h
  13,135,134,208 bytes allocated in the heap
   2,542,451,800 bytes copied during GC
     150,771,464 bytes maximum residency (38 sample(s))
         723,192 bytes maximum slop
             305 MB total memory in use (0 MB lost due to fragmentation)
                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     12789 colls,     0 par    2.714s   2.713s     0.0002s    0.1642s
  Gen  1        38 colls,     0 par    0.001s   0.001s     0.0000s    0.0002s
  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    2.642s  (  2.643s elapsed)
  GC      time    2.715s  (  2.715s elapsed)
  EXIT    time    0.009s  (  0.009s elapsed)
  Total   time    5.368s  (  5.368s elapsed)
  %GC     time      50.6%  (50.6% elapsed)
  Alloc rate    4,970,944,482 bytes per MUT second
  Productivity  49.4% of total user, 49.4% of total elapsed

Leak-ghc-8.2.2-conduit-1.3.0.3-no-prof.hp

The culpit of the leak is in the fact that there's "outer" and "inner" reference to XML.parseBytes. The inner reference never gets executed, yet the mere presence of it triggers the issue. Producing the input without xml parsing (Conduit.iterate succ 0 .| Conduit.map show) fixes the issue. Replacing the inner conduit with just runConduit $ pure () .| sinkNull or error "IMPOSSIBLE HAPPENED" fixes the issue as well.

Trying conduit-1.2

Here's the output of the same code built against conduit-1.2

$ ./Leak-ghc-8.2.2-conduit-1.2-no-prof +RTS -s -h
  13,887,205,872 bytes allocated in the heap
     237,351,736 bytes copied during GC
         138,560 bytes maximum residency (277 sample(s))
          29,352 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)
                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     13212 colls,     0 par    0.261s   0.260s     0.0000s    0.0003s
  Gen  1       277 colls,     0 par    0.015s   0.015s     0.0001s    0.0001s
  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    2.718s  (  2.720s elapsed)
  GC      time    0.276s  (  0.275s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    2.995s  (  2.995s elapsed)
  %GC     time       9.2%  (9.2% elapsed)
  Alloc rate    5,108,707,229 bytes per MUT second
  Productivity  90.8% of total user, 90.8% of total elapsed

2018-05-02-173033_1064x707_scrot

Trying to profile

Building it against conduit-1.3 WITH profiling, makes the problem go away and gives this output (take note of run timing, it's much slower):

$ ./Leak-ghc-8.2.2-conduit-1.3.0.3-prof +RTS -s -h
  27,133,276,560 bytes allocated in the heap
     223,977,776 bytes copied during GC
         178,624 bytes maximum residency (496 sample(s))
          27,688 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)
                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     25727 colls,     0 par    0.441s   0.476s     0.0000s    0.0223s
  Gen  1       496 colls,     0 par    0.009s   0.010s     0.0000s    0.0017s
  INIT    time    0.001s  (  0.003s elapsed)
  MUT     time   29.634s  ( 29.720s elapsed)
  GC      time    0.404s  (  0.421s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.045s  (  0.065s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   30.085s  ( 30.144s elapsed)
  %GC     time       1.3%  (1.4% elapsed)
  Alloc rate    915,606,794 bytes per MUT second
  Productivity  98.5% of total user, 98.4% of total elapsed

2018-05-02-173820_1068x711_scrot

I haven't yet managed to build the thing with -prof, but without having SCCs everywhere (since stack is forcing -auto-all and -auto-cafs)

Attempts to find a fix

Here is a incomplete and fuzzy list of things that I've tried (in combinations as well) and that DIDN't help with the issue:

The biggest suspect that I had is unwanted conduit sharing in CAFs, but so far I couldn't prove that this is the case.

The second big suspicion are the rewrite rules. I haven't yet tried to debug them (e.g. looking at what rules are firing, etc). I've inspected the Core output to ensure that no sharing is happening, but I didn't inspect the changes that happen by enabling/disabling rewrite rules.

The next logical step would be to get reproduce it without xml-conduit which would allow to have much better grasp at what's going on.

snoyberg commented 6 years ago

Thanks for the report. I'm not going to have a chance to look into this yet. A repro that doesn't use xml-conduit would be ideal.

unhammer commented 5 years ago

I've been using conduit-1.2.10 until now, but would like to move to 1.3.1.1, but this bug is holding me back and I don't see how to avoid it. This simple code:

{-# OPTIONS_GHC -fno-full-laziness #-}

module Main where

import           Data.Conduit          (runConduitRes, (.|))
import qualified Data.Conduit.Binary   as C
import qualified Data.Conduit.List     as C
import           System.Directory      (getDirectoryContents)
import           System.Environment    (getArgs)
import qualified Text.XML.Stream.Parse as X

parseXMLFile :: FilePath -> IO ()
parseXMLFile f = do
  _ <- runConduitRes (C.sourceFile f
                     .| X.parseBytes X.def
                     .| C.sinkNull)
  return ()

main :: IO ()
main = do
  files <- getArgs
  mapM_ parseXMLFile files

run with

stack exec ghc -- --make -rtsopts X.hs -o X
./X /tmp/f.xml +RTS -hT -sstderr

never even reaches 1MB memory usage on conduit-1.2.10 / xml-conduit-1.5.1, while with conduit-1.3.1 / xml-conduit-1.8.0.1 it goes up to 100MB when I pass it a 23MB xml file. If I pass the same file twice it of course makes sense to retain the result value, but the result should not take 100MB :)

The -fno-full-laziness suggested at https://www.well-typed.com/blog/2016/09/sharing-conduit/ and https://github.com/michaelt/streaming/issues/6 seems to have no effect (which makes me think it's not the same as https://gitlab.haskell.org/ghc/ghc/issues/9520 where -fno-full-laziness was a solution). Compiling with -prof -fprof-auto makes the leak go away, but of course runs very slowly.

I can avoid the leak with e.g. explicitly parseXML "/tmp/f.xml", but as soon as it seems like there's any chance files might contain the same file twice, ghc starts retaining the memory (and not just of the result value, but all intermediate values it seems). And unfortunately ghc isn't smart enough to notice it when I put nub in front of files

zudov commented 5 years ago

@unhammer, have you tried applying -fno-full-laziness to xml-conduit (by specifying it in cabal/stack file)?

unhammer commented 5 years ago

No, I was just applying it to my local project …

@zudov Thank you so much, that fixed it =D It now runs in nearly constant memory even in our big pipeline with zero other changes than

diff --git a/stack.yaml b/stack.yaml
index abeb271b..f9f511ac 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -49,6 +49,8 @@

+ghc-options:
+    xml-conduit: -fno-full-laziness
f-f commented 5 years ago

@unhammer happy to hear this fixes it for you! (credit goes to folks at Well-Typed for the suggestion for -fno-full-laziness on xml-conduit)

However since the original report we switched to xeno (in the hope it would fix it), and turns out we still have this problem even without xml-conduit. The general pattern seems to be that GHC is not able to figure out (anymore, since it was able on conduit-1.2) that it shouldn't hold on resources when conditionally doing effects inside a conduit.

We're now trying to have a small replication without xml-conduit, or maybe we'll switch back to the xml-conduit version of our code and apply the workaround

unhammer commented 5 years ago

@f-f are you using xeno with conduit? I thought there wasn't any conduit-support for xeno yet. Or are you saying the leak is because of a newer GHC?