sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.15k stars 896 forks source link

Inheriting from Nokogiri::XML::Node on JRuby (1.6.4/5) fails #560

Closed benlangfeld closed 12 years ago

benlangfeld commented 13 years ago

The following simple test case:


require 'nokogiri'

class MyNode < Nokogiri::XML::Node
  def self.new(*args)
    super
  end
end

class MyNode2 < Nokogiri::XML::Node
  def initialize(*args)
    super
  end
end

class MyNode3 < Nokogiri::XML::Node
end

describe MyNode do
  subject { MyNode.new 'foo', Nokogiri::XML::Document.new }

  its(:name) { should == 'foo' }

  it 'should read/write attributes correctly' do
    subject['foo'] = 'bar'
    subject.attributes['foo'].should be_a Nokogiri::XML::Attr
  end
end

describe MyNode2 do
  subject { MyNode2.new 'foo', Nokogiri::XML::Document.new }

  its(:name) { should == 'foo' }

  it 'should read/write attributes correctly' do
    subject['foo'] = 'bar'
    subject.attributes['foo'].should be_a Nokogiri::XML::Attr
  end
end

describe MyNode3 do
  subject { MyNode3.new 'foo', Nokogiri::XML::Document.new }

  its(:name) { should == 'foo' }

  it 'should read/write attributes correctly' do
    subject['foo'] = 'bar'
    subject.attributes['foo'].should be_a Nokogiri::XML::Attr
  end
end

describe Nokogiri::XML::Node do
  subject { Nokogiri::XML::Node.new 'foo', Nokogiri::XML::Document.new }

  its(:name) { should == 'foo' }

  it 'should read/write attributes correctly' do
    subject['foo'] = 'bar'
    subject.attributes['foo'].should be_a Nokogiri::XML::Attr
  end
end

causes the following failures:

{16:13}[jruby-1.6.5]~/Desktop ben% be rspec nokogiri_java.rb 
F.F.F...

Failures:

  1) MyNode should read/write attributes correctly
     Failure/Error: subject.attributes['foo'].should be_a Nokogiri::XML::Attr
       expected nil to be a kind of Nokogiri::XML::Attr
     # ./nokogiri_java.rb:25:in `(root)'

  2) MyNode2 should read/write attributes correctly
     Failure/Error: subject.attributes['foo'].should be_a Nokogiri::XML::Attr
       expected nil to be a kind of Nokogiri::XML::Attr
     # ./nokogiri_java.rb:36:in `(root)'

  3) MyNode3 should read/write attributes correctly
     Failure/Error: subject.attributes['foo'].should be_a Nokogiri::XML::Attr
       expected nil to be a kind of Nokogiri::XML::Attr
     # ./nokogiri_java.rb:47:in `(root)'

Finished in 2.15 seconds
8 examples, 3 failures

Failed examples:

rspec ./nokogiri_java.rb:23 # MyNode should read/write attributes correctly
rspec ./nokogiri_java.rb:34 # MyNode2 should read/write attributes correctly
rspec ./nokogiri_java.rb:45 # MyNode3 should read/write attributes correctly
bundle exec rspec nokogiri_java.rb  31.41s user 1.75s system 233% cpu 14.218 total

These failures do not occur on CRuby (1.9.2/3) or when using Nokogiri 1.4.7. They have been introduced in Nokogiri 1.5, and unfortunately getting Nokogiri's tests running on JRuby is impossible, as is a bisect, due to the non-linear history between the 1.4 and 1.5 branches.

benlangfeld commented 13 years ago

Subscribing @taylor and @lgleasain

yokolet commented 13 years ago

Fixed in rev. #5aacda4a5bb758679e07f9f8d4d265202c21c359

As far as I tried, all tests passed after the change.

Thanks for reporting.

benlangfeld commented 13 years ago

Thanks, this test case now passes.

Linking the commit for easy reference: https://github.com/tenderlove/nokogiri/commit/5aacda4a5bb758679e07f9f8d4d265202c21c359

benlangfeld commented 13 years ago

So, it turns out this isn't the whole story. Attributes are now accessible via #attributes, but not #[], as demonstrated in this extended test case:

require 'nokogiri'

MyNode = Class.new Nokogiri::XML::Node

shared_examples_for 'a node' do
  let(:node) { node_class.new 'foo', Nokogiri::XML::Document.new }

  subject { node }

  its(:name) { should == 'foo' }

  describe "writing an attribute" do
    before { node['foo'] = 'bar' }

    describe "accessing via #attributes" do
      subject { node.attributes['foo'] }
      it { should be_a Nokogiri::XML::Attr }
      its(:value) { should == 'bar' }
    end

    it 'should have the right key' do
      node.key?('foo').should be true
    end

    it 'should be readable via #[]' do
      node['foo'].should == 'bar'
    end
  end
end

describe MyNode do
  let(:node_class) { MyNode }
  it_should_behave_like 'a node'
end

describe Nokogiri::XML::Node do
  let(:node_class) { Nokogiri::XML::Node }
  it_should_behave_like 'a node'
end
.FF.......

Failures:

  1) MyNode it should behave like a node writing an attribute should have the right key
     Failure/Error: node.key?('foo').should be true

       expected #<TrueClass:2> => true
            got #<NilClass:4> => nil

       Compared using equal?, which compares object identity,
       but expected and actual are not the same object. Use
       'actual.should == expected' if you don't care about
       object identity in this example.
     Shared Example Group: "a node" called from ./nokogiri_java.rb:33
     # ./nokogiri_java.rb:22:in `(root)'

  2) MyNode it should behave like a node writing an attribute should be readable via #[]
     Failure/Error: node['foo'].should == 'bar'
       expected: "bar"
            got: nil (using ==)
     Shared Example Group: "a node" called from ./nokogiri_java.rb:33
     # ./nokogiri_java.rb:26:in `(root)'

Finished in 2.07 seconds
10 examples, 2 failures

Failed examples:

rspec ./nokogiri_java.rb:21 # MyNode it should behave like a node writing an attribute should have the right key
rspec ./nokogiri_java.rb:25 # MyNode it should behave like a node writing an attribute should be readable via #[]
yokolet commented 13 years ago

Maybe, "MyNode" is the correct class on line 4 in the test file, right?

I changed node_class to MyNode and ran the test. As you said, I got the failures on pure Java but not on libxml version. I'll have a look.

benlangfeld commented 13 years ago

node_class is perfectly correct, and is set toward the bottom of the pasted code.

yokolet commented 13 years ago

Sorry about my ignorance on rspec. I switched back MyNode to node_class. It worked exactly the same as yours.

yokolet commented 13 years ago

Fixed in rev. 875448f

The reason of this sort of failures is methods are implemented in subclass of XmlNode. Perhaps, add_namespace, remove_attribute and some other methods have the same problem?

benlangfeld commented 13 years ago

Why are they implemented in subclasses? That's not the case on the C implementation, as all of these things work.

yokolet commented 13 years ago

Because it is Java. The implementation is following Java's culture. Node is an abstract type. Node might be Attribute, or Text. Does Attribute have attribute? Does Text have attribute? Thus, the attribute related methods used in the given tests should be implemented in Element in Java's culture. If Java's object oriented approach is applied, your tests should inherit Nokogiri::XML::Element. Because inherited class is expected to be an Element.

But, Nokogiri is Ruby. Ruby's idea of object oriented is a bit different from Java's. Different behavior from libxml version is considered bug. So, I'll fix those.

tenderlove commented 13 years ago

Can you send us tests that use test/unit or minitest? I don't understand this test case.

benlangfeld commented 13 years ago

Essentially what is required is to run all of the node tests on MyNode = Class.new Nokogiri::XML::Node. I'll submit that as a pull request ASAP. I'll also re-factor this test case to minitest.

lgleasain commented 13 years ago

Here it is in test/unit format. All of these fail for me in JRuby.

require 'rubygems'
require 'test/unit'
require 'nokogiri'
class TestAdd < Test::Unit::TestCase

  MyNode = Class.new Nokogiri::XML::Node

  def setup
    @node = MyNode.new 'foo', Nokogiri::XML::Document.new
    assert @node.name == 'foo'
    @node['foo'] = 'bar'
  end

  def test_a_node_writing_an_attribute_accessing_via_attributes 
    assert @node.attributes['foo']
  end

  def test_a_node_writing_an_attribute_accessing_via_key 
    assert @node.key? 'foo'
  end

  def test_a_node_writing_an_attribute_accessing_via_brackets 
    assert @node['foo'] == 'bar'
  end
end
tenderlove commented 13 years ago

Thanks for the test/unit tests! I'm curious and I'd like to understand our users better: why are you subclassing nokogiri? If you construct an XML document using subclasses then serialize the XML, your information is lost:

>> require 'nokogiri'
=> true
>> Z = Class.new Nokogiri::XML::Node
=> Z
>> doc = Nokogiri::XML::Document.new
=> #<Nokogiri::XML::Document:0x85e87454 name="document">
>> node = Z.new 'foo', doc
=> #<Z:0x85e85154 name="foo">
>> doc.root = node
=> #<Z:0x85e85154 name="foo">
>> puts doc
<?xml version="1.0"?>
<foo/>
=> nil
>> doc2 = Nokogiri.XML(doc.to_xml)
=> #<Nokogiri::XML::Document:0x85e80564 name="document" children=[#<Nokogiri::XML::Element:0x85e80244 name="foo">]>
>> doc2.root.class == doc.root.class
=> false
>> doc2.to_xml == doc.to_xml
=> true
>>

The XML representation of both documents is identical, which would indicate that the documents are identical. Yet if we compare the classes of the root nodes, we'll see that the documents are different. Why would this difference be desirable?

yokolet commented 13 years ago

@lgleasain Is that all? Master brach works correctly as expected.

benlangfeld commented 13 years ago

So this use case is from Blather, where we're doing manual importing of an XML string to create an instance of a particular class from an element name/namespace registration combo. For example, Blather::Stanza::Iq::Command (https://github.com/sprsquish/blather/blob/develop/lib/blather/stanza/iq/command.rb) registers itself for a <command xmlns="http://jabber.org/protocol/commands"/> element. This allows using Nokogiri to easily create/parse XMPP stanzas.

You can see the importing/registration code here: https://github.com/sprsquish/blather/blob/develop/lib/blather/xmpp_node.rb

The Punchblock gem does something similar for the Rayo XMPP extension.

@sprsquish can provide a more coherant explanation of this rationale, no doubt.

benlangfeld commented 13 years ago

Additional relevant changes: 875448fa0f219a5eeec70d5e37ffa24fcf926dcc b50d4a9337420eb76af000c8d69fe2035eb6c901

I'll check later today that this is everything we need.

flavorjones commented 13 years ago

@benlangfeld -- everything OK with you on master?

benlangfeld commented 13 years ago

Not exactly. I get a whole lot of this right now:

  1) MyNode it should behave like a node writing an attribute should have the right key
     Failure/Error: before { node['foo'] = 'bar' }
     NoMethodError:
       undefined method `set' for #<MyNode:0x80ca6478 name="foo">
     Shared Example Group: "a node" called from ./nokogiri_java.rb:35
     # /Users/ben/code/ruby/nokogiri/lib/nokogiri/xml/node.rb:261:in `[]='
     # ./nokogiri_java.rb:15:in `block (3 levels) in <top (required)>'

The source of this issue is now totally lost on me, but this is suspicious: https://github.com/tenderlove/nokogiri/commit/df274e6b8fb6f929be9313fc4d60174da3b08b67

yokolet commented 13 years ago

@benlangfeld you are right. That commit surely breaks your test.

The reason fo regression is... I didn't add test case for this issue. Sorry. I'll soon add the test provided.

yokolet commented 13 years ago

@benlangfeld I was wrong. That commit doesn't harm node inheritance.

I tested two rspec tests provided here. Both successfully finished on master branch. Would you try again using latest master branch?

I added the test in rev. be4851d so that regression won't happen.

benlangfeld commented 13 years ago

Unfortunately I am unable to run the tests fully on JRuby:

{11:42}[jruby-1.6.5]~/code/ruby/nokogiri@master✗✗✗ ben% be rake test           
install -c tmp/java/nokogiri/nokogiri.jar lib/nokogiri/nokogiri.jar
/Users/ben/Developer/.rvm/rubies/jruby-1.6.5/bin/jruby -w -I.:lib:bin:test:. -e 'require "rubygems"; require "minitest/autorun"; require "test/test_convert_xpath.rb"; require "test/test_css_cache.rb"; require "test/test_encoding_handler.rb"; require "test/test_memory_leak.rb"; require "test/test_nokogiri.rb"; require "test/test_reader.rb"; require "test/test_soap4r_sax.rb"; require "test/test_xslt_transforms.rb"; require "test/css/test_nthiness.rb"; require "test/css/test_parser.rb"; require "test/css/test_tokenizer.rb"; require "test/css/test_xpath_visitor.rb"; require "test/decorators/test_slop.rb"; require "test/html/test_builder.rb"; require "test/html/test_document.rb"; require "test/html/test_document_encoding.rb"; require "test/html/test_document_fragment.rb"; require "test/html/test_element_description.rb"; require "test/html/test_named_characters.rb"; require "test/html/test_node.rb"; require "test/html/test_node_encoding.rb"; require "test/html/sax/test_parser.rb"; require "test/html/sax/test_parser_context.rb"; require "test/xml/test_attr.rb"; require "test/xml/test_attribute_decl.rb"; require "test/xml/test_builder.rb"; require "test/xml/test_c14n.rb"; require "test/xml/test_cdata.rb"; require "test/xml/test_comment.rb"; require "test/xml/test_document.rb"; require "test/xml/test_document_encoding.rb"; require "test/xml/test_document_fragment.rb"; require "test/xml/test_dtd.rb"; require "test/xml/test_dtd_encoding.rb"; require "test/xml/test_element_content.rb"; require "test/xml/test_element_decl.rb"; require "test/xml/test_entity_decl.rb"; require "test/xml/test_entity_reference.rb"; require "test/xml/test_namespace.rb"; require "test/xml/test_node.rb"; require "test/xml/test_node_attributes.rb"; require "test/xml/test_node_encoding.rb"; require "test/xml/test_node_inheritance.rb"; require "test/xml/test_node_reparenting.rb"; require "test/xml/test_node_set.rb"; require "test/xml/test_parse_options.rb"; require "test/xml/test_processing_instruction.rb"; require "test/xml/test_reader_encoding.rb"; require "test/xml/test_relax_ng.rb"; require "test/xml/test_schema.rb"; require "test/xml/test_syntax_error.rb"; require "test/xml/test_text.rb"; require "test/xml/test_unparented_node.rb"; require "test/xml/test_xinclude.rb"; require "test/xml/test_xpath.rb"; require "test/xml/node/test_save_options.rb"; require "test/xml/node/test_subclass.rb"; require "test/xml/sax/test_parser.rb"; require "test/xml/sax/test_parser_context.rb"; require "test/xml/sax/test_push_parser.rb"; require "test/xslt/test_custom_functions.rb"; require "test/xslt/test_exception_handling.rb"' -- 
/Users/ben/code/ruby/nokogiri/lib/nokogiri/css.rb:3 warning: global variable `$-w' not initialized
/Users/ben/code/ruby/nokogiri/test/helper.rb:10: version info: {"warnings"=>[], "nokogiri"=>"1.5.0", "ruby"=>{"version"=>"1.9.2", "platform"=>"java", "description"=>"jruby 1.6.5 (ruby-1.9.2-p136) (2011-10-25 9dcd388) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_29) [darwin-x86_64-java]", "engine"=>"jruby", "jruby"=>"1.6.5"}}
Run options: --seed 33531

# Running tests:

.........................................................................................................................................................dyld: lazy symbol binding failed: Symbol not found: _rb_catch
  Referenced from: /Users/ben/code/ruby/nokogiri/vendor/jruby/1.9/gems/racc-1.4.7/lib/racc/cparse.bundle
  Expected in: flat namespace

dyld: Symbol not found: _rb_catch
  Referenced from: /Users/ben/code/ruby/nokogiri/vendor/jruby/1.9/gems/racc-1.4.7/lib/racc/cparse.bundle
  Expected in: flat namespace

rake aborted!
Command failed with status (133): [/Users/ben/Developer/.rvm/rubies/jruby-1.6...]

Tasks: TOP => test
(See full trace by running task with --trace)
bundle exec rake test  49.78s user 2.64s system 247% cpu 21.143 total
yokolet commented 12 years ago

http://groups.google.com/group/nokogiri-talk/browse_thread/thread/8f58feca3b25fcdc

You need to run rake on CRuby first. This generates some necessary files in a source tree. Then, try rake on JRuby.

How about your 2 rspec tests pasted here, above?

benlangfeld commented 12 years ago

So now the tests hang indefinitely:

{16:27}[jruby-1.6.5]~/code/ruby/nokogiri@master✗✗✗ ben% rake
install -c tmp/java/nokogiri/nokogiri.jar lib/nokogiri/nokogiri.jar
/Users/ben/Developer/.rvm/rubies/jruby-1.6.5/bin/jruby -w -I.:lib:bin:test:. -e 'require "rubygems"; require "minitest/autorun"; require "test/test_convert_xpath.rb"; require "test/test_css_cache.rb"; require "test/test_encoding_handler.rb"; require "test/test_memory_leak.rb"; require "test/test_nokogiri.rb"; require "test/test_reader.rb"; require "test/test_soap4r_sax.rb"; require "test/test_xslt_transforms.rb"; require "test/css/test_nthiness.rb"; require "test/css/test_parser.rb"; require "test/css/test_tokenizer.rb"; require "test/css/test_xpath_visitor.rb"; require "test/decorators/test_slop.rb"; require "test/html/test_builder.rb"; require "test/html/test_document.rb"; require "test/html/test_document_encoding.rb"; require "test/html/test_document_fragment.rb"; require "test/html/test_element_description.rb"; require "test/html/test_named_characters.rb"; require "test/html/test_node.rb"; require "test/html/test_node_encoding.rb"; require "test/html/sax/test_parser.rb"; require "test/html/sax/test_parser_context.rb"; require "test/xml/test_attr.rb"; require "test/xml/test_attribute_decl.rb"; require "test/xml/test_builder.rb"; require "test/xml/test_c14n.rb"; require "test/xml/test_cdata.rb"; require "test/xml/test_comment.rb"; require "test/xml/test_document.rb"; require "test/xml/test_document_encoding.rb"; require "test/xml/test_document_fragment.rb"; require "test/xml/test_dtd.rb"; require "test/xml/test_dtd_encoding.rb"; require "test/xml/test_element_content.rb"; require "test/xml/test_element_decl.rb"; require "test/xml/test_entity_decl.rb"; require "test/xml/test_entity_reference.rb"; require "test/xml/test_namespace.rb"; require "test/xml/test_node.rb"; require "test/xml/test_node_attributes.rb"; require "test/xml/test_node_encoding.rb"; require "test/xml/test_node_inheritance.rb"; require "test/xml/test_node_reparenting.rb"; require "test/xml/test_node_set.rb"; require "test/xml/test_parse_options.rb"; require "test/xml/test_processing_instruction.rb"; require "test/xml/test_reader_encoding.rb"; require "test/xml/test_relax_ng.rb"; require "test/xml/test_schema.rb"; require "test/xml/test_syntax_error.rb"; require "test/xml/test_text.rb"; require "test/xml/test_unparented_node.rb"; require "test/xml/test_xinclude.rb"; require "test/xml/test_xpath.rb"; require "test/xml/node/test_save_options.rb"; require "test/xml/node/test_subclass.rb"; require "test/xml/sax/test_parser.rb"; require "test/xml/sax/test_parser_context.rb"; require "test/xml/sax/test_push_parser.rb"; require "test/xslt/test_custom_functions.rb"; require "test/xslt/test_exception_handling.rb"' -- 
/Users/ben/code/ruby/nokogiri/lib/nokogiri/css.rb:3 warning: global variable `$-w' not initialized
/Users/ben/code/ruby/nokogiri/test/helper.rb:10: version info: {"warnings"=>[], "nokogiri"=>"1.5.0", "ruby"=>{"version"=>"1.9.2", "platform"=>"java", "description"=>"jruby 1.6.5 (ruby-1.9.2-p136) (2011-10-25 9dcd388) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_29) [darwin-x86_64-java]", "engine"=>"jruby", "jruby"=>"1.6.5"}}
Loaded suite -e
Started
.................................................................................................................................................................................^Crake  436.79s user 4.20s system 107% cpu 6:51.88 total

My rspec tests now pass, but I have a couple of remaining failures in a broader test suite which I will debug and report on within the hour.

benlangfeld commented 12 years ago

This is the remaining test failure:


require 'nokogiri'

MyNode = Class.new Nokogiri::XML::Node

shared_examples_for 'a node' do
  let(:node) { node_class.new 'foo', Nokogiri::XML::Document.new }

  subject { node }

  it { should be_a node_class }

  describe 'adding a namespace' do
    before { node.add_namespace nil, 'foo:bar' } # A similar failure can be caused on Ruby 1.9.3 here by replacing nil with ''

    its(:namespace) { should be_a Nokogiri::XML::Namespace }

    it 'should set the right href' do
      subject.namespace.href.should == 'foo:bar'
    end
  end
end

describe MyNode do
  let(:node_class) { MyNode }
  it_should_behave_like 'a node'
end

describe Nokogiri::XML::Node do
  let(:node_class) { Nokogiri::XML::Node }
  it_should_behave_like 'a node'
end
{16:53}[jruby-1.6.5]~/Desktop ben% be rspec nokogiri_java.rb
......FF........

Failures:

  1) MyNode it should behave like a node adding a namespace should set the right href
     Failure/Error: subject.namespace.href.should == 'foo:bar'
     NoMethodError:
       undefined method `href' for nil:NilClass
     Shared Example Group: "a node" called from ./nokogiri_java.rb:45
     # ./nokogiri_java.rb:38:in `(root)'

  2) MyNode it should behave like a node MyNode it should behave like a node adding a namespace namespace 
     Failure/Error: its(:namespace) { should be_a Nokogiri::XML::Namespace }
       expected nil to be a kind of Nokogiri::XML::Namespace
     Shared Example Group: "a node" called from ./nokogiri_java.rb:45
     # ./nokogiri_java.rb:35:in `(root)'

Finished in 1.96 seconds
16 examples, 2 failures

Failed examples:

rspec ./nokogiri_java.rb:37 # MyNode it should behave like a node adding a namespace should set the right href
rspec ./nokogiri_java.rb:35 # MyNode it should behave like a node MyNode it should behave like a node adding a namespace namespace 
bundle exec rspec nokogiri_java.rb  30.41s user 1.56s system 243% cpu 13.152 total
yokolet commented 12 years ago

Ah, 1.9 mode. Same here. Before, rake worked on 1.9 mode though there were many failures.

This deserves another ticket. I'll open that.

Thanks for trying rake test.

yokolet commented 12 years ago

I fixed the bug in rev. 766db38 . Now, the rspec test posted here on Nov. 18 passed on master.

If you have a chance, would you try it?

benlangfeld commented 12 years ago

I can confirm that this is fixed now.