prewk / xml-string-streamer

Stream large XML files with low memory consumption.
MIT License
356 stars 49 forks source link

How to parse xml file to array with xml-string-streamer? #43

Closed falconz closed 8 years ago

falconz commented 8 years ago

How to parse xml file to array with xml-string-streamer? thanks!

prewk commented 8 years ago

Hi!

Firstly, it's important to know that everything you save to an array takes up memory. So if you're parsing a really large XML file that's too big to fit in SimpleXML, maybe you're going to have problems saving it to a really large array too.

The point of this library is to save memory by working at one node at a time. Saving those nodes into memory on every iteration might defeat the purpose (if it's a lot of data).

With that said, imagine an XML file that look like this:

<?xml version="1.0" encoding="UTF-8"?>
<customers>
  <customer>
    <email>............</email>
  </customer>
  <customer>
    <email>............</email>
  </customer>
  <customer>
    <email>............</email>
  </customer>
  <customer>
    <email>............</email>
  </customer>
</customers>

You can parse the customer nodes into an array like this:

<?php

// Extract all <customer> nodes from the xml
$streamer = Prewk\XmlStringStreamer::createUniqueNodeParser("your-large-xml.xml", array("uniqueNode" => "customer"));

// We want to save them into this array in array form
$customers = array();

// Go through all of the <customer> nodes
while ($node = $streamer->getNode()) {
    $simpleXmlNode = simplexml_load_string($node);
    // $simpleXmlNode is now a <customer> node

    // Convert the node into an array by coercion and running it through json converts
    $customer = json_decode(json_encode((array)$xml), true);

    // Save it to your $customers array
    $customers[] = $customer;
}

I took the xml to array part from http://stackoverflow.com/questions/6167279/converting-a-simplexml-object-to-an-array and http://stackoverflow.com/questions/7778814/how-to-convert-simplexmlobject-into-php-array.

Note: I haven't tested it, but your question is more of a general PHP question than a question about this library specifically.