the-real-blackh / hexpat

A general purpose Haskell XML library using Expat to do its parsing
BSD 3-Clause "New" or "Revised" License
2 stars 7 forks source link

High memory usage on lazy parsing #10

Open vlatkoB opened 6 years ago

vlatkoB commented 6 years ago

I'm testing hexpat on a larger XML file (140MB), with a relatively simple structure:

<tmx version="1.4b">
  <header ...> </header>
  <body>
    <tu tuid="1">
      <tuv xml:lang="en-US">  <seg>Some text</seg> </tuv>
      <tuv xml:lang="hr-HR">  <seg>Other text</seg> </tuv>
    </tuv>
     ... 350,000 <tu> tags

with this code (error is not touched, as specified in the main example for lazy parsing)

withHexpat fIn fOut  = do
  inputText <- L.readFile fIn
  let (xml, _) = parse defaultParseOptions inputText :: (UNode String,Maybe XMLParseError)
  outputFile xml
  where
    outputFile xml = do
      h <- IO.openBinaryFile fOut IO.WriteMode
      L.hPutStr h $ format xml
      IO.hClose h

it very soon takes over 10GBs of memory, when I have to kill it.