mbklein / equivalent-xml

Easy equivalency tests for Nokogiri and Oga XML
MIT License
94 stars 29 forks source link

documents with space prior to xml preamble fail to compare on jruby #9

Closed coffeejunk closed 12 years ago

coffeejunk commented 12 years ago

I found this relatively obscure bug when using equivalent-xml with jruby, which I think is a bug in nokogiri:

when comparing to xml documents where one of them starts with a string before the xml preamble, nokogiri ignores the following elements in the document.

jruby-1.7.0 :001 > require 'nokogiri'
 => true 
jruby-1.7.0 :002 > doc1 = Nokogiri::XML(" <?xml version='1.0' encoding='utf-8' ?><first \>")
 => #<Nokogiri::XML::Document:0x7fa name="document"> 
jruby-1.7.0 :003 > doc2 = Nokogiri::XML("<?xml version='1.0' encoding='utf-8' ?><first \>")
 => #<Nokogiri::XML::Document:0x7fe name="document" children=[#<Nokogiri::XML::Element:0x7fc name="first">]>

this works perfectly fine in ruby 1.8, 1.9, rbx1.8 and rbx1.9:

1.9.3-p286 :001 > require 'nokogiri'
 => true 
1.9.3-p286 :002 > doc1 = Nokogiri::XML(" <?xml version='1.0' encoding='utf-8' ?><first \>")
 => #<Nokogiri::XML::Document:0x3fd36c8e30bc name="document" children=[#<Nokogiri::XML::ProcessingInstruction:0x3fd36c8e2b80 name="xml">, #<Nokogiri::XML::Element:0x3fd36c8e2964 name="first">]> 
1.9.3-p286 :003 > doc2 = Nokogiri::XML("<?xml version='1.0' encoding='utf-8' ?><first \>")
 => #<Nokogiri::XML::Document:0x3fd36c8d7f28 name="document" children=[#<Nokogiri::XML::Element:0x3fd36c8d776c name="first">]>

weird however is, that this works as originally expected with all ruby implementations when the first element is not the xml preamble:

jruby-1.7.0 :004 > doc1 = Nokogiri::XML("<bar><foo /></bar>")
 => #<Nokogiri::XML::Document:0x804 name="document" children=[#<Nokogiri::XML::Element:0x802 name="bar" children=[#<Nokogiri::XML::Element:0x800 name="foo">]>]> 
jruby-1.7.0 :005 > doc2 = Nokogiri::XML(" <bar><foo /></bar>")
 => #<Nokogiri::XML::Document:0x80a name="document" children=[#<Nokogiri::XML::Element:0x808 name="bar" children=[#<Nokogiri::XML::Element:0x806 name="foo">]>]> 
1.9.3-p286 :004 > doc1 = Nokogiri::XML("<bar><foo /></bar>")
 => #<Nokogiri::XML::Document:0x3fd36c8cef90 name="document" children=[#<Nokogiri::XML::Element:0x3fd36c8ceb30 name="bar" children=[#<Nokogiri::XML::Element:0x3fd36c8ce8d8 name="foo">]>]> 
1.9.3-p286 :005 > doc2 = Nokogiri::XML(" <bar><foo /></bar>")
 => #<Nokogiri::XML::Document:0x3fd36c8c8848 name="document" children=[#<Nokogiri::XML::Element:0x3fd36c8c8280 name="bar" children=[#<Nokogiri::XML::Element:0x3fd36c8c7b50 name="foo">]>]>

Although I assume that this is a bug in nokogiri, I published two test cases, to my fork of the repository, showing off the odd behaviour because I think this should be a know issue.

  # this passes on jruby
  it "should ignore leading whitespace #1" do
    doc1 = Nokogiri::XML("<bar><foo /></bar>")
    doc2 = Nokogiri::XML(" <bar><foo /></bar>")
    doc1.should be_equivalent_to(doc2)
  end

  # this fails on jruby
  it "should ignore leading whitespace #2" do
    doc1 = Nokogiri::XML("<?xml version='1.0' encoding='utf-8' ?><foo />")
    doc2 = Nokogiri::XML(" <?xml version='1.0' encoding='utf-8' ?><foo />")
    doc1.should be_equivalent_to(doc2)
  end
Failures:

  1) EquivalentXml should ignore leading whitespace #2
     Failure/Error: doc1.should be_equivalent_to(doc2)
       expected:
       <?xml version="1.0"?>

       got:
       <?xml version="1.0"?>
       <foo/>
     # ./spec/equivalent-xml_spec.rb:123:in `(root)'

the complete output is visible on travis ci

mbklein commented 12 years ago

Thanks for reporting this, and for reporting it upstream to the Nokogiri folks.

coffeejunk commented 11 years ago

just as an FYI this bug has been fixed with nokogiri 1.5.6 (https://travis-ci.org/coffeejunk/equivalent-xml/jobs/3773113)