xmppo / node-expat

libexpat XML SAX parser binding for node.js
https://github.com/xmppo/node-expat
MIT License
384 stars 97 forks source link

Memory leak and incorrect behavior of stop/resume #200

Open georgyfarniev opened 5 years ago

georgyfarniev commented 5 years ago

Hello.

Thanks for creating this parser, it works fast and has nice documentation, really wonderful job. Unfortunately, I ran into issues with pausing stream. I need it for writing record to database before continuing.

I noticed incorrect behavior when I call stop/resume in closing tag handler, and I got out of memory crash on big xml file, but unfortunately I cannot provide this file, as it contains corporate information.

So, to reproduce, use scripts below. First script will generate large XML file, and second will show error. While there's 100000 records in sample.xml, parser will show less closed tags (in my case, 3274). If I remove stop and resume calls, it shows 100000 as it should.

On another file (which I cannot provide unfortunately) it causes out of memory crash. I will try to create auto-generated file which can reproduce this issue.

Sample generator:

const fs = require('fs')
const stream = require('stream')

const SAMPLE_RECORDS_COUNT = 100000
const SAMPLE_PATH = './sample.xml'

class DataStream extends stream.Readable {
  constructor() {
    super()
    this.count = 0
  }

  _read() {
    if (this.count === 0) {
      this.push('<?xml version="1.0" encoding="UTF-8"?><root>')
      this.count++
      return;
    }

    this.count++

    if (this.count > SAMPLE_RECORDS_COUNT) {
      this.push('</root>')
      this.push(null)
    } else {
      this.push('<child>text</child>\n')
    }

  }
}

const ws = fs.createWriteStream(SAMPLE_PATH)
const ds = new DataStream()

ds.pipe(ws).on('close', () => {
  console.log('done')
  process.exit()
})

Reproduce:

const fs = require('fs')
const expat = require('node-expat')

console.log('started')

const parser = new expat.Parser('UTF-8')
const rs = fs.createReadStream('./sample.xml')

parser.on('startElement', (elt, attrs) => {
})

let count = 0
parser.on('endElement', async (elt, attrs) => {
  count++
  parser.stop()
  parser.resume()
})

parser.on('error', console.error)
rs.on('end', () => console.log(count))

rs.pipe(parser)
DanielDornhardt commented 5 years ago

Having the same issue... is there maybe some idea for a workaround?

georgyfarniev commented 5 years ago

Having the same issue... is there maybe some idea for a workaround?

@DanielDornhardt, you have to call resume after next tick. E.g.:

parser.on('endElement', async (elt, attrs) => {
  count++
  parser.stop()

  executeSomePromise().then(parser.resume)
 // or
 nextTick(parser.resume)
})

It's not well documented, but I found it.

DanielDornhardt commented 5 years ago

Hi @georgyfarniev , thank you for helping us out... I'm not sure if it'll work out yet but we're working on it.