:lang: en
= XmlSimple - XML made easy
== Introduction
The XmlSimple class offers an easy API to read and write XML. It is a Ruby translation of Grant McLean's Perl module http://www.cpan.org/modules/by-module/XML/GRANTM/[XML::Simple]. Please note, that this tutorial was originally written by Grant McLean. I have only converted it to Asciidoc and adjusted it for the Ruby version.
== Installation
XmlSimple is available as a Gem, so you can install it as follows:
....
gem install xml-simple
....
== Quick Start
Say you have a script called foo and a file of configuration options called foo.xml containing this:
....
10.0.0.101
10.0.1.101
10.0.0.102
10.0.0.103
10.0.1.103
....
The following lines of code in foo:
....
require 'xmlsimple'
config = XmlSimple.xml_in('foo.xml', { 'KeyAttr' => 'name' })
....
will 'slurp' the configuration options into the Hash config (if no arguments are passed to _xmlin the name and location of the XML file will be inferred from name and location of the script). You can dump out the contents of the Hash using p config
, which will produce something like this (formatting has been adjusted for brevity):
....
{
'logdir' => '/var/log/foo/',
'debugfile' => '/tmp/foo.debug',
'server' => {
'sahara' => {
'osversion' => '2.6',
'osname' => 'solaris',
'address' => [ '10.0.0.101', '10.0.1.101' ]
},
'gobi' => {
'osversion' => '6.5',
'osname' => 'irix',
'address' => [ '10.0.0.102' ]
},
'kalahari' => {
'osversion' => '2.0.34',
'osname' => 'linux',
'address' => [ '10.0.0.103', '10.0.1.103' ]
}
}
}
....
Your script could then access the name of the log directory like this:
....
puts config['logdir']
....
Similarly, the second address on the server 'kalahari' could be referenced as:
....
puts config['server']['kalahari']['address'][1]
....
What could be simpler? (Rhetorical).
For simple requirements, that's really all there is to it. If you want to store your XML in a different directory or file, or pass it in as a string, you'll need to check out the section on <<options, options>>. If you want to turn off or tweak the array folding feature (that neat little transformation that produced config['server']) you'll find options for that as well.
If you want to generate XML (for example to write a modified version of config back out as XML), check out _xmlout.
== Description
The XmlSimple class provides a simple API layer on top of the https://github.com/ruby/rexml/[REXML] parser. Additionally, two functions are exported: _xmlin and _xmlout.
The simplest approach is to call these two functions directly, but an optional object-oriented interface (see the section on <<OOInterface, Optional OO Interface>> below) allows them to be called as methods of an XmlSimple object.
=== xml_in
Parses XML formatted data and returns a reference to a data structure which contains the same information in a more readily accessible form. (Skip down to the section on <<examples, examples>> below, for more sample code).
_xmlin accepts an optional XML specifier followed by a Hash containing 'name => value' option pairs. The XML specifier can be a filename, nil, or an IO object.
==== A Filename
If the filename contains no directory components _xmlin will look for the file in each directory in the searchpath (see the section on <<options, options>> below). For example:
....
ref = XmlSimple.xml_in('/etc/params.xml')
....
==== nil
If there is no XML specifier, _xmlin will check the script directory and each of the searchpath directories for a file with the same name as the script but with the extension '.xml'. Note: if you wish to specify options, you must specify the value nil:
....
ref = XmlSimple.xml_in(nil, { 'ForceArray' => false })
....
==== A String of XML
A string containing XML (recognised by the presence of '<' and '>' characters) will be parsed directly. For example:
....
ref = XmlSimple.xml_in('')
....
==== An IO object
An IO object will be read to EOF and its contents parsed. For example:
....
file = File.open('/etc/params.xml')
ref = XmlSimple.xml_in(file)
....
=== xml_out
Takes a data structure (generally a Hash) and returns an XML encoding of that structure. If the resulting XML is parsed using _xmlin, it will return a data structure equivalent to the original.
When translating hashes to XML, hash keys which have a leading '-' will be silently skipped. This is the approved method for marking elements of a data structure which should be ignored by _xmlout. (Note: If these items were not skipped the key names would be emitted as element or attribute names with a leading '-' which would not be valid XML).
=== Caveats
Some care is required in creating data structures which will be passed to _xmlout. Hash keys from the data structure will be encoded as either XML element names or attribute names. Therefore, you should use hash key names which conform to the relatively strict XML naming rules:
Names in XML must begin with a letter. The remaining characters may be letters, digits, hyphens (-), underscores (_) or full stops (.). It is also allowable to include one colon (:) in an element name but this should only be used when working with namespaces - a facility well beyond the scope of XmlSimple.
You can use other punctuation characters in hash values (just not in hash keys) however XmlSimple does not support dumping binary data.
If you break these rules, the current implementation of _xmlout will simply emit non-compliant XML which will be rejected if you try to read it back in. (A later version of XmlSimple might take a more proactive approach).
Note also that although you can nest hashes and arrays to arbitrary levels, recursive data structures are not supported and will cause _xmlout to raise an exception.
[[options]]
== Options
IMPORTANT NOTE FOR USERS OF THE PERL VERSION! The default values of some options have changed, some options are not supported and I have added new options, too:
- 'ForceArray' is true by default.
- 'KeyAttr' defaults to [] and not to ['name', 'key', 'id'].
- The SAX parts of XML::Simple are currently not supported.
- Namespaces are currently not supported.
- Currently, there is no 'strict mode'.
- 'AnonymousTag' is not available in the current Perl version.
- 'Indent' is not available in the current Perl version.
- The Perl version does not support so called blessed references and raises an exception ("can't encode value of type"), if one is used. The Ruby version supports all object types, because every object in Ruby has a _tos method.
XmlSimple supports a number of options. If you find yourself repeatedly having to specify the same options, you might like to investigate the section on link:#OOInterface["Optional OO Interface"] below.
Because there are so many options, it's hard for new users to know which ones are important, so here are the two you really need to know about:
- Check out 'ForceArray' because you'll almost certainly want to leave it on.
- Make sure you know what the 'KeyAttr' option does and what its default value is because it may surprise you otherwise.
Both _xmlin and _xmlout expect a single argument followed by a Hash containing options. So, an option takes the form of a 'name => value' pair. The options listed below are marked with 'in' if they are recognised by _xmlin and 'out' if they are recognised by _xmlout.
Each option is also flagged to indicate whether it is:
- 'important' - don't use the module until you understand this
- 'handy' - you can skip this on the first time through
- 'advanced' - you can skip this on the second time through
- 'seldom used' - you'll probably never use this unless you were the person that requested the feature
The options are listed alphabetically.
Note: Option names are not case-sensitive, so you can use the mixed case versions shown here. Additionally, you can put underscores between the words (for example 'key_attr').
=== AnonymousTag => 'tag name' (in + out) (seldom used)
By default, the tag to declare an anonymous value is 'anon'. Using option 'AnonymousTag' you can set it to an arbitrary string (that must obey to the XML naming rules, of course).
=== Cache => [ cache scheme(s) ] (in) (advanced)
Because loading the REXML parser module and parsing an XML file can consume a significant number of CPU cycles, it is often desirable to cache the output of _xmlin for later reuse.
When parsing from a named file, XmlSimple supports a number of caching schemes. The 'Cache' option may be used to specify one or more schemes (using an anonymous array). Each scheme will be tried in turn in the hope of finding a cached pre-parsed representation of the XML file. If no cached copy is found, the file will be parsed and the first cache scheme in the list will be used to save a copy of the results. The following cache schemes have been implemented:
==== storable
Utilises Marshal to read/write a cache file with the same name as the XML file but with the extension .stor.
==== mem_share
When a file is first parsed, a copy of the resulting data structure is retained in memory in XmlSimple's namespace. Subsequent calls to parse the same file will return a reference to this structure. This cached version will persist only for the life of the Ruby interpreter (which in the case of mod_ruby for example, may be some significant time).
Because each caller receives a reference to the same data structure, a change made by one caller will be visible to all. For this reason, the reference returned should be treated as read-only.
==== mem_copy
This scheme works identically to 'mem_share' (above) except that each caller receives a reference to a new data structure which is a copy of the cached version. Copying the data structure will add a little processing overhead, therefore this scheme should only be used where the caller intends to modify the data structure (or wishes to protect itself from others who might). This scheme uses the Marshal module to perform the copy.
Warning! The memory-based caching schemes compare the timestamp on the file to the time when it was last parsed. If the file is stored on an NFS filesystem (or other network share) and the clock on the file server is not exactly synchronised with the clock where your script is run, updates to the source XML file may appear to be ignored.
=== ContentKey => 'keyname' (in + out) (seldom used)
When text content is parsed to a hash value, this option let's you specify a name for the hash key to override the default 'content'. So for example:
....
XmlSimple.xml_in('Text', { 'ContentKey' => 'text' })
....
will parse to:
....
{ 'one' => '1', 'text' => 'Text' }
....
instead of:
....
{ 'one' => '1', 'content' => 'Text' }
....
_xmlout will also honour the value of this option when converting a hash to XML.
You can also prefix your selected key name with a '-' character to have _xmlin try a little harder to eliminate unnecessary 'content' keys after array folding. For example:
....
XmlSimple.xml_in(%q(
- First
-
- Second
-
), {
'KeyAttr' => { 'item' => 'name' },
'ForceArray' => [ 'item' ],
'ContentKey' => '-content'
})
....
will parse to:
....
{
'item' => {
'one' => 'First',
'two' => 'Second'
}
}
....
rather than this (without the '-'):
....
{
'item' => {
'one' => { 'content' => 'First' },
'two' => { 'content' => 'Second' }
}
}
....
=== ForceArray => true | false (in) (IMPORTANT!)
This option should be set to _true_ to force nested elements to be represented as arrays even when there is only one. For example, with 'ForceArray' enabled, this XML:
....
value
....
would parse to this:
....
{
'name' => [ 'value' ]
}
....
instead of this (the default):
....
{
'name' => 'value'
}
....
This option is especially useful if the data structure is likely to be written back out as XML and the default behaviour of rolling single nested elements up into attributes is not desirable.
If you are using the array folding feature, you should almost certainly enable this option. If you do not, single nested elements will not be parsed to arrays and therefore will not be candidates for folding to a hash.
The option is _true_ by default.
=== ForceArray => [ name(s) ] (in) (IMPORTANT!)
This alternative form of the 'ForceArray' option allows you to specify a list of element names which should always be forced into an array representation, rather than the 'all or nothing' approach above.
It is also possible to include compiled regular expressions in the list - any element names which match the pattern will be forced to arrays. If the list contains only a single regex, then it is not necessary to enclose it in an Array. For example,
....
'ForceArray' => %r(_list$)
....
=== ForceContent (in) (seldom used)
When _xml_in_ parses elements which have text content as well as attributes, the text content must be represented as a hash value rather than a simple scalar. This option allows you to force text content to always parse to a hash value even when there are no attributes. So, for example:
....
xml =%q(
text1
text2
)
XmlSimple.xml_in(xml, { 'ForceContent' => true })
....
will parse to:
....
{
'x' => { 'content' => 'text1' },
'y' => { 'a' => '2', 'content' => 'text2' }
}
....
instead of:
....
{
'x' => 'text1',
'y' => { 'a' => '2', 'content' => 'text2' }
}
....
=== GroupTags => { grouping tag => grouped tag } (in + out) (handy)
You can use this option to eliminate extra levels of indirection in your Ruby data structure. For example this XML:
....
xml = %q(
usr/bin
usr/local/bin
usr/X11/bin
)
....
Would normally be read into a structure like this:
....
{
'searchpath' => {
'dir' => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
}
}
....
But when read in with the appropriate value for 'GroupTags':
....
opt = XmlSimple.xml_in(xml, { 'GroupTags' => { 'searchpath' => 'dir' })
....
It will return this simpler structure:
....
{
'searchpath' => [ '/usr/bin', '/usr/local/bin', '/usr/X11/bin' ]
}
....
You can specify multiple 'grouping element' to 'grouped element' mappings in the same Hash. If this option is combined with 'KeyAttr', the array folding will occur first and then the grouped element names will be eliminated.
_xml_out_ will also use the grouptag mappings to re-introduce the tags around the grouped elements. Beware though that this will occur in all places that the 'grouping tag' name occurs - you probably don't want to use the same name for elements as well as attributes.
=== Indent => 'string' (out) (seldom used)
By default, _xml_out_'s pretty printing mode indents the output document using two blanks. 'Indent' allows you to use an arbitrary string for indentation.
If the 'NoIndent' option is set, 'Indent' will be ignored.
=== KeepRoot => true | false (in + out) (handy)
In its attempt to return a data structure free of superfluous detail and unnecessary levels of indirection, _xml_in_ normally discards the root element name. Setting the 'KeepRoot' option to _true_ will cause the root element name to be retained. So after executing this code:
....
config = XmlSimple.xml_in('', { 'KeepRoot' => true })
....
you'll be able to reference the tempdir as _config['config']['tempdir']_ instead of the default _config['tempdir']_.
Similarly, setting the 'KeepRoot' option to _true_ will tell _xml_out_ that the data structure already contains a root element name and it is not necessary to add another.
=== KeyAttr => [ list ] (in + out) (IMPORTANT!)
This option controls the 'array folding' feature which translates nested elements from an array to a hash. For example, this XML:
....
....
would, by default, parse to this:
....
{
'user' => [
{
'login' => 'grep',
'fullname' => 'Gary R Epstein'
},
{
'login' => 'stty',
'fullname' => 'Simon T Tyson'
}
]
}
....
If the option 'KeyAttr => "login"' were used to specify that the 'login' attribute is a key, the same XML would parse to:
....
{
'user' => {
'stty' => {
'fullname' => 'Simon T Tyson'
},
'grep' => {
'fullname' => 'Gary R Epstein'
}
}
}
....
The key attribute names should be supplied in an array if there is more than one. _xml_in_ will attempt to match attribute names in the order supplied. _xml_out_ will use the first attribute name supplied when 'unfolding' a hash into an array.
_Note:_ the 'KeyAttr' option controls the folding of arrays. By default, a single nested element will be rolled up into a scalar rather than an array and therefore will not be folded. Use the 'ForceArray' option to force nested elements to be parsed into arrays and therefore candidates for folding into hashes.
The default value for 'KeyAttr' is _[]_, that is, the array folding feature is disabled.
=== KeyAttr => { list } (in + out) (IMPORTANT!)
This alternative method of specifying the key attributes allows more fine grained control over which elements are folded and on which attributes. For example the option 'KeyAttr => { 'package' => 'id' } will cause any package elements to be folded on the 'id' attribute. No other elements which have an 'id' attribute will be folded at all.
_Note:_ _xml_in_ will generate a warning if this syntax is used and an element which does not have the specified key attribute is encountered (for example: a 'package' element without an 'id' attribute, to use the example above).
Two further variations are made possible by prefixing a '+' or a '-' character to the attribute name:
The option
....
'KeyAttr' => { 'user' => "+login" }'
....
will cause this XML:
....
....
to parse to this data structure:
....
{
'user' => {
'stty' => {
'fullname' => 'Simon T Tyson',
'login' => 'stty'
},
'grep' => {
'fullname' => 'Gary R Epstein',
'login' => 'grep'
}
}
}
....
The '+' indicates that the value of the key attribute should be copied rather than moved to the folded hash key.
A '-' prefix would produce this result:
....
{
'user' => {
'stty' => {
'fullname' => 'Simon T Tyson',
'-login' => 'stty'
},
'grep' => {
'fullname' => 'Gary R Epstein',
'-login' => 'grep'
}
}
}
....
As described earlier, _xml_out_ will ignore hash keys starting with a '-'.
=== AttrPrefix => true | false (in + out) (handy)
XmlSimple treats attributes and elements equally and there is no way to determine, if a certain hash key has been derived from an element name or from an attribute name. Sometimes you need this information and that's when you use the _AttrPrefix_ option:
....
xml_str = <
Joe
Joe
211 Over There
Jacksonville
FL
11234
3535 Head Office
Jacksonville
FL
11234
XML_STR
result = XmlSimple.xml_in xml_str, { 'ForceArray' => false, 'AttrPrefix' => true }
p result
....
produces:
....
{
"@id" => "12253",
"first_name" => "Joe",
"Address" => [
{
"city" => "Jacksonville",
"line1" => "211 Over There",
"zip_code" => "11234",
"@type" => "home",
"state" => "FL"
},
{
"city" => "Jacksonville",
"line1" => "3535 Head Office",
"zip_code" => "11234",
"@type" => "postal",
"state" => "FL"
}
],
"last_name" => "Joe"
}
....
As you can see all hash keys that have been derived from attributes are prefixed by an @ character, so now you know if they have been elements or attributes before. Of course, _xml_out_ knows how to correctly transform hash keys prefixed by an @ character, too:
....
doc = REXML::Document.new XmlSimple.xml_out(result, 'AttrPrefix' => true)
d = ''
doc.write(d)
puts d
....
produces:
....
Joe
Joe
211 Over There
Jacksonville
FL
11234
3535 Head Office
Jacksonville
FL
11234
....
=== NoAttr => true | false (in + out) (handy)
When used with _xml_out_, the generated XML will contain no attributes. All hash key/values will be represented as nested elements instead.
When used with _xml_in_, any attributes in the XML will be ignored.
=== NormaliseSpace => 0 | 1 | 2 (in) (handy)
This option controls how whitespace in text content is handled. Recognised values for the option are:
* 0 - The default behaviour is for whitespace to be passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation).
* 1 - Whitespace is normalised in any value used as a hash key (normalising means removing leading and trailing whitespace and collapsing sequences of whitespace characters to a single space).
* 2 - Whitespace is normalised in all text content.
Note: you can spell this option with a 'z' if that is more natural for you.
=== NoEscape => true | false (out) (seldom used)
By default, _xml_out_ will translate the characters <, >, &, ', and " to '<', '>', '&', '&apos', and '"' respectively. Use this option to suppress escaping (presumably because you've already escaped the data in some more sophisticated manner).
=== NoIndent => true | false (out) (seldom used)
Set this option to _true_ to disable _xml_out_'s default 'pretty printing' mode. With this option enabled, the XML output will all be on one line (unless there are newlines in the data) - this may be easier for downstream processing.
=== KeyToSymbol => true | false (in) (handy)
If set to _true_ (default is _false_) all keys are turned into symbols, that is, the following snippet
....
doc = <<-DOC
Hello
world
Inner
DOC
p XmlSimple.xml_in(doc, 'KeyToSymbol' => true)
....
produces:
....
{
:x => ["Hello"],
:y => ["World"],
:z => [ { :inner => ["Inner"] } ]
}
....
=== AttrToSymbol => true | false (in) (handy)
If set to _true_ (default is _false_) all keys are turned into symbols, that is, the following snippet
....
doc = <<-DOC
DOC
p XmlSimple.xml_in(doc, 'AttrToSymbol' => true)
....
produces:
....
{
"msg" => [ { :text => "Hello, world!" } ]
}
....
=== OutputFile => (out) (handy)
The default behavior of _xml_out_ is to return the XML as a string. If you wish to write the XML to a file, simply supply the filename using the 'OutputFile' option. Alternatively, you can supply an object derived from IO instead of a filename.
=== RootName => 'string' (out) (handy)
By default, when _xml_out_ generates XML, the root element will be named 'opt'. This option allows you to specify an alternative name.
Specifying either _nil_ or the empty string for the 'RootName' option will produce XML with no root elements. In most cases the resulting XML fragment will not be 'well formed' and therefore could not be read back in by _xml_in_. Nevertheless, the option has been found to be useful in certain circumstances.
=== SearchPath => [ list ] (in) (handy)
Where the XML is being read from a file, and no path to the file is specified, this attribute allows you to specify which directories should be searched.
If the first parameter to _xml_in_ is undefined, the default searchpath will contain only the directory in which the script itself is located. Otherwise the default searchpath will be empty.
_Note:_ the current directory ('.') is not searched unless it is the directory containing the script.
=== SelfClose => true | false (out)
If set, _xml_out_ will use self-closing tags for empty elements. For example:
....
....
instead of
....
....
=== SuppressEmpty => true | '' | nil (in + out) (handy)
This option controls what _xml_in_ should do with empty elements (no attributes and no content). The default behaviour is to represent them as empty hashes. Setting this option to _true_ will cause empty elements to be skipped altogether. Setting the option to _nil_ or the empty string will cause empty elements to be represented as _nil_ or the empty string respectively. The latter two alternatives are a little easier to test for in your code than a hash with no keys.
=== Variables => { name => value } (in) (handy)
This option allows variables in the XML to be expanded when the file is read. (there is no facility for putting the variable names back if you regenerate XML using _xml_out_).
A 'variable' is any text of the form "${name}" which occurs in an attribute value or in the text content of an element. If 'name' matches a key in the supplied Hash, "${name}" will be replaced with the corresponding value from the Hash. If no matching key is found, the variable will not be replaced.
=== VarAttr => 'attr_name' (in) (handy)
In addition to the variables defined using 'Variables', this option allows variables to be defined in the XML. A variable definition consists of an element with an attribute called 'attr_name' (the value of the 'VarAttr' option). The value of the attribute will be used as the variable name and the text content of the element will be used as the value. A variable defined in this way will override a variable defined using the 'Variables' option. For example:
....
XmlSimple.xml_in(%q(
usr/local/apache
${prefix}
${exec_prefix}/bin
), {
'VarAttr' => 'name', 'ContentKey' => '-content'
})
....
produces the following data structure:
....
{
'dir' => {
'prefix' => '/usr/local/apache',
'exec_prefix' => '/usr/local/apache',
'bindir' => '/usr/local/apache/bin',
}
}
....
=== XmlDeclaration => _true_ | 'string' (out) (handy)
If you want the output from _xml_out_ to start with the optional XML declaration, simply set the option to _true_. The default XML declaration is:
....
....
If you want some other string (for example to declare an encoding value), set the value of this option to the complete string you require.
=== conversions => { regex => lambda } (in) (handy)
When importing XML documents it's often necessary to filter or transform certain elements or attributes. The _conversions_ option helps you to do this. It expects a Hash object where the keys are regular expressions identifying element or attribute names. The values are lambda functions that will be applied to the matching elements.
Let's say we have a file named status.xml containing the following document:
....
OK
10
2
....
The following statement
....
conversions = {
/^total|failed$/ => lambda { |v| v.to_i },
/^status$/ => lambda { |v| v.downcase }
}
p XmlSimple.xml_in(
'status.xml',
:conversions => conversions,
:forcearray => false
)
....
produces the following output:
....
{
'status' => 'ok',
'total' => 10,
'failed' => 2
}
....
[[OOInterface]]
== Optional OO Interface
The procedural interface is both simple and convenient, but if you have to define a set of default values which should be used on all subsequent calls to _xml_in_ or _xml_out_, you might prefer to use the object-oriented (OO) interface.
The default values for the options described above are unlikely to suit everyone. The OO interface allows you to effectively override _XmlSimple_'s defaults with your preferred values. It works like this:
First create an _XmlSimple_ parser object with your preferred defaults:
....
xs = XmlSimple.new({ 'ForceArray' => false, 'KeepRoot' => true)
....
then call _xml_in_ or _xml_out_ as a method of that object:
....
ref = xs.xml_in(xml)
xml = xs.xml_out(ref)
....
You can also specify options when you make the method calls and these values will be merged with the values specified when the object was created. Values specified in a method call take precedence.
== Error Handling
The XML standard is very clear on the issue of non-compliant documents. An error in parsing any single element (for example a missing end tag) must cause the whole document to be rejected. _XmlSimple_ will raise an appropriate exception if it encounters a parsing error.
[[examples]]
== Examples
When _xml_in_ reads the following very simple piece of XML:
....
....
it returns the following data structure:
....
{
'username' => 'testuser',
'password' => 'frodo'
}
....
The identical result could have been produced with this alternative XML:
....
....
Or this (although see 'ForceArray' option for variations):
....
testuser
frodo
....
Repeated nested elements are represented as anonymous arrays:
....
joe@smith.com
jsmith@yahoo.com
bob@smith.com
{
'person' => [
{
'email' => [
'joe@smith.com',
'jsmith@yahoo.com'
],
'firstname' => 'Joe',
'lastname' => 'Smith'
},
{
'email' => ['bob@smith.com'],
'firstname' => 'Bob',
'lastname' => 'Smith'
}
]
}
....
Nested elements with a recognised key attribute are transformed (folded) from an array into a hash keyed on the value of that attribute, that is, calling _xml_in_ with the 'KeyAttr' set to _[key]_ will transform
....
....
into
....
{
'person' => {
'jbloggs' => {
'firstname' => 'Joe',
'lastname' => 'Bloggs'
},
'tsmith' => {
'firstname' => 'Tom',
'lastname' => 'Smith'
},
'jsmith' => {
'firstname' => 'Joe',
'lastname' => 'Smith'
}
}
}
....
The tag can be used to form anonymous arrays:
....
Col 1Col 2Col 3
R1C1R1C2R1C3
R2C1R2C2R2C3
R3C1R3C2R3C3
{
'head' => [
[ 'Col 1', 'Col 2', 'Col 3' ]
],
'data' => [
[ 'R1C1', 'R1C2', 'R1C3' ],
[ 'R2C1', 'R2C2', 'R2C3' ],
[ 'R3C1', 'R3C2', 'R3C3' ]
]
}
....
Anonymous arrays can be nested to arbitrary levels and as a special case, if the surrounding tags for an XML document contain only an anonymous array the array will be returned directly rather than the usual hash:
....
Col 1Col 2
R1C1R1C2
R2C1R2C2
[
[ 'Col 1', 'Col 2' ],
[ 'R1C1', 'R1C2' ],
[ 'R2C1', 'R2C2' ]
]
....
Elements which only contain text content will simply be represented as a scalar. Where an element has both attributes and text content, the element will be represented as a hash with the text content in the 'content' key:
....
first
second
{
'one' => 'first',
'two' => { 'attr' => 'value', 'content' => 'second' }
}
....
Mixed content (elements which contain both text content and nested elements) will be not be represented in a useful way - element order and significant whitespace will be lost. If you need to work with mixed content, then _XmlSimple_ is not the right tool for your job - check out the next section.
[[further]]
== Where to from here?
_XmlSimple_ is by nature very simple.
* The parsing process liberally disposes of 'surplus' whitespace - some applications will be sensitive to this.
* Slurping data into a hash will implicitly discard information about attribute order. Normally this would not be a problem because any items for which order is important would typically be encoded as elements rather than attributes. However, _XmlSimple_'s aggressive slurping and folding algorithms can defeat even these techniques.
* The API offers little control over the output of _xml_out_. In particular, it is not especially likely that feeding the output from _xml_in_ into _xml_out_ will reproduce the original XML (although passing the output from _xml_out_ into _xml_in_ should reproduce the original data structure).
* _xml_out_ cannot produce well-formed HTML unless you feed it with care - hash keys must conform to XML element naming rules and _nil_ values should be avoided.
* _xml_out_ does not currently support encodings (although it shouldn't stand in your way if you feed it encoded data).
* If you're attempting to get the output from _xml_out_ to conform to a specific DTD, you're almost certainly using the wrong tool for the job.
If any of these points are a problem for you, then _XmlSimple_ is probably not the right class for your application.
== FAQ
Question: if I include XmlSimple in a rails app and run for example 'rake' in the root of the app, I always get the following warnings:
....
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:275:
warning: already initialized constant KNOWN_OPTIONS
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:280:
warning: already initialized constant DEF_KEY_ATTRIBUTES
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:281:
warning: already initialized constant DEF_ROOT_NAME
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:282:
warning: already initialized constant DEF_CONTENT_KEY
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:283:
warning: already initialized constant DEF_XML_DECLARATION
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:284:
warning: already initialized constant DEF_ANONYMOUS_TAG
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:285:
warning: already initialized constant DEF_FORCE_ARRAY
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:286:
warning: already initialized constant DEF_INDENTATION
/usr/local/lib/ruby/gems/1.8/gems/xml-simple-1.0.10/lib/xmlsimple.rb:287:
warning: already initialized constant DEF_KEY_TO_SYMBOL
....
Answer: The reason for this is, that you're using XmlSimple explicitly in a rails app. XmlSimple is part of rails (you can find it in ./actionpack-1.12.5/lib/action_controller/vendor/xml_simple.rb). Unfortunately, the library is named "xml_simple.rb" and not "xmlsimple.rb". Ruby's "require" prevents you from loading a library two times and it does so by checking if a file name occurs more than once. In your case somewhere in the rails framework "require 'xml_simple'" is performed and you run "require 'xmlsimple'" afterwards. Hence, the library is loaded twice and all constants are redefined.
A solution is to only require xml-simple unless XmlSimple has not been defined already.
== Acknowledgements
A big "Thank you!" goes to
* Grant McLean for Perl's
http://www.cpan.org/modules/by-module/XML/GRANTM/[XML::Simple]
* Yukihiro Matsumoto for Ruby.
* Sean Russell for REXML.
* Dave Thomas for Rdoc.
* Nathaniel Talbott for Test::Unit.
* Minero Aoki for his setup package.
== Contact
If you have any suggestions or want to report bugs, please mailto:contact@maik-schmidt.de[contact] me.