sferik / multi_xml

A generic swappable back-end for XML parsing
MIT License
157 stars 40 forks source link

Inconsistent handling of "empty" elements. #46

Open trevorrowe opened 9 years ago

trevorrowe commented 9 years ago

Given the following Gemfile

source 'https://rubygems.org'

gem 'multi_xml'
gem 'nokogiri'
gem 'ox'
gem 'libxml-ruby'

Given the following Gemfile.lock

GEM
  remote: https://rubygems.org/
  specs:
    libxml-ruby (2.8.0)
    mini_portile (0.6.2)
    multi_xml (0.5.5)
    nokogiri (1.6.6.2)
      mini_portile (~> 0.6.0)
    ox (2.1.8)

PLATFORMS
  ruby

DEPENDENCIES
  libxml-ruby
  multi_xml
  nokogiri
  ox

And the following script:

require 'bundler/setup'
require 'multi_xml'
require 'libxml'
require 'ox'
require 'nokogiri'

xml = '<xml> </xml>'

[:nokogiri, :rexml, :ox, :libxml].each do |parser|
  MultiXml.parser = parser
  puts "#{parser}: #{MultiXml.parse(xml).inspect}"
end

I get the following output:

nokogiri: {"xml"=>nil}
rexml: {"xml"=>" "}
ox: {"xml"=>" "}
libxml: {"xml"=>nil}
trevorrowe commented 9 years ago

I'm not 100% positive, but I suspect the expected behavior is to return " ", single space character, from all four parsers. I would be equally okay if they all returned nil as well. I'm most interested in them being consistent. Thoughts?