prewk / xml-string-streamer

Stream large XML files with low memory consumption.
MIT License
356 stars 49 forks source link

StringWalker and self-closing tags #56

Closed kristoffeys closed 6 years ago

kristoffeys commented 6 years ago

When i try to use the StringWalker, self-closing tags seem te get skipped from the streamer. I've been digging a bit, and the issue seems to be in file 'StringWalker.php' on line 233.

When i change $depth < 0 to $depth <= 0, the self-closing tags seem to be included in the streamer.

kristoffeys commented 6 years ago

57

prewk commented 6 years ago

Thanks! I'll check it out as soon as I can.

prewk commented 6 years ago

I've dug around (it's been a couple of years) and I seem to have a test case for self-closing tags and the StringWalker already, see: https://github.com/prewk/xml-string-streamer/blob/master/tests/unit/XmlStringStreamer/Parser/StringWalkerTest.php#L283

Can you give an example of some XML with self-closing tags that doesn't work? I'd like to update the test case and work my way backwards from there.

kristoffeys commented 6 years ago

I've tried parsing:

<?xml version="1.0" ?>
<root>
  <foo>baz</foo>
  <bar />
</root>

With following code:

$streamer = XmlStringStreamer::createStringWalkerParser(
  $xmlFile,
  array(
    'captureDepth' => 2
  )
);

while ($node = $streamer->getNode()) {
  $data = simplexml_load_string($node, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);
  var_dump($house->getName());
}

And only foo get's dumped, no bar

This doesn't seem to be something that is covered in the tests since only the parent of the self-closing text is a node that is checked, not the self-closing tag itself. (if that makes sense)

prewk commented 6 years ago

Good catch! I'll look into it.

prewk commented 6 years ago

I made some adjustments more in line with my own thinking, could you give it a try? Version 0.11.0.

kristoffeys commented 6 years ago

seems to be fixed with the new version. can you push the new version to xml-string-streamer-guzzle as well? Thanks!

prewk commented 6 years ago

Done. I've created my own little stupid dependency hell with the guzzle package, should've solved it in another way. Oh well.

prewk commented 6 years ago

(The guzzle package is now at 0.3.0)

kristoffeys commented 6 years ago

Thanks man!