matthewmatician / xml-flow

An XML/HTML stream reader, now with less suck!
MIT License
55 stars 18 forks source link

xml-flow

NPM version Build status dependencies Test Coverage

Dealing with XML data can be frustrating. Especially if you have a whole-lot of it. Most XML readers work on the entire XML document as a String: this can be problematic if you need to read very large XML files. With xml-flow, you can use streams to only load a small part of an XML document into memory at a time.

xml-flow has only one dependency, sax-js. This means it will run nicely on windows environments.

Installation

$ npm install xml-flow

Getting started

xml-flow tries to keep the parsed output as simple as possible. Here's an example:

Input File

<root>
  <person>
    <name>Bill</name>
    <id>1</id>
    <age>27</age>
  </person>
  <person>
    <name>Sally</name>
    <id>2</id>
    <age>29</age>
  </person>
  <person>
    <name>Kelly</name>
    <id>3</id>
    <age>37</age>
  </person>
</root>

Usage

var fs = require('fs')
  , flow = require('xml-flow')
  , inFile = fs.createReadStream('./your-xml-file.xml')
  , xmlStream = flow(inFile)
;

xmlStream.on('tag:person', function(person) {
  console.log(person);
});

Output

{name: 'Bill', id: '1', age: '27'}
{name: 'Sally', id: '2', age: '29'}
{name: 'Kelly', id: '3', age: '37'}

Features

Attribute-only Tags

The above example shows the of an XML document with no attributes. What about the opposite?

Input
<root>
    <person name="Bill" id="1" age="27"/>
    <person name="Sally" id="2" age="29"/>
    <person name="Kelly" id="3" age="37"/>
</root>
Output
{name: 'Bill', id: '1', age: '27'}
{name: 'Sally', id: '2', age: '29'}
{name: 'Kelly', id: '3', age: '37'}

Both Attributes and Subtags

When you have tags that have both Attributes and subtags, here's how the output looks:

Input
<root>
    <person name="Bill" id="1" age="27">
        <friend id="2"/>
    </person>
    <person name="Sally" id="2" age="29">
        <friend id="1"/>
        <friend id="3"/>
    </person>
    <person name="Kelly" id="3" age="37">
        <friend id="2"/>
        Kelly likes to ride ponies.
    </person>
</root>
Output
{
    $attrs: {name: 'Bill', id: '1', age: '27'},
    friend:'2'
}
{
    $attrs: {name: 'Sally', id: '2', age: '29'},
    friend: ['1', '3']
}
{
    $attrs: {name: 'Kelly', id: '3', age: '37'},
    friend: '2',
    $text: 'Kelly likes to ride ponies.'
}

Read as Markup

If you need to keep track of sub-tag order within a tag, or if it makes sense to have a more markup-style object model, here's how it works:

Input
<div class="science">
    <h1>Title</h>
    <p>Some introduction</p>
    <h2>Subtitle</h>
    <p>Some more text</p>
    This text is not inside a p-tag.
</div>
Output
{
    $attrs: {class: 'science'},
    $markup: [
        {$name: 'h1', $text: 'Title'},
        {$name: 'p', $text: 'Some Introduction'},
        {$name: 'h2', $text: 'Subtitle'},
        {$name: 'p', $text: 'Some more text'},
        'This text is not inside a p-tag.'
    ]
}

Options

You may add a second argument when calling the function, as flow(stream, options). All are optional:

Events

All events can be listened to via common nodeJS EventEmitter syntax.

tag:<<TAG_NAME>> - Fires when any <<TAG_NAME>> is parsed. Note that this is case sensitive. If the lowercase option is set, make sure you listen to lowercase tag names. If the strict option is set, match the case of the tags in your document.

end - Fires when the end of the stream has been reached.

error - Fires when there are errors.

query:<<QUERY>> - Coming soon...

toXml Utility

toXml(node, options) - Returns a string, XML-encoding of an object. Encodes $name, $attrs, $text, and $markup as you would expect. the following options are available:

Authors

License

MIT