sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.14k stars 896 forks source link

push_parser.rb:47: [BUG] Segmentation fault #849

Closed lucasgertel closed 11 years ago

lucasgertel commented 11 years ago

Sup everyone. I'm trying to crawl a page using anemone + nokogiri:

Nokogiri (1.5.6)


---
warnings: []
nokogiri: 1.5.6
ruby:
  version: 1.9.3
  platform: x86_64-darwin12.2.0
  description: ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]
  engine: ruby
libxml:
  binding: extension
  compiled: 2.9.0
  loaded: 2.9.0

/Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/xml/sax/push_parser.rb:47: [BUG] Segmentation fault ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]

-- Control frame information ----------------------------------------------- c:0021 p:---- s:0094 b:0094 l:000093 d:000093 CFUNC :native_write c:0020 p:0019 s:0089 b:0089 l:000088 d:000088 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/xml/sax/push_parser.rb:47 c:0019 p:0213 s:0084 b:0084 l:000083 d:000083 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html/document.rb:189 c:0018 p:0346 s:0077 b:0077 l:000076 d:000076 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html/document.rb:122 c:0017 p:0050 s:0069 b:0069 l:000068 d:000068 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html.rb:15 c:0016 p:0049 s:0061 b:0061 l:000060 d:000060 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:77 c:0015 p:0039 s:0058 b:0058 l:000057 d:000057 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:60 c:0014 p:0011 s:0055 b:0055 l:000054 d:000054 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:84 c:0013 p:0107 s:0052 b:0052 l:001fe8 d:000051 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:168 c:0012 p:---- s:0048 b:0048 l:000047 d:000047 FINISH c:0011 p:---- s:0046 b:0046 l:000045 d:000045 CFUNC :loop c:0010 p:0114 s:0043 b:0043 l:001fe8 d:001fe8 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:163 c:0009 p:0029 s:0038 b:0038 l:000bd8 d:000037 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:92 c:0008 p:0109 s:0035 b:0035 l:000c98 d:000c98 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:83 c:0007 p:---- s:0030 b:0030 l:000029 d:000029 FINISH c:0006 p:---- s:0028 b:0028 l:000027 d:000027 CFUNC :new c:0005 p:0019 s:0023 b:0023 l:000bd8 d:000bd8 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:90 c:0004 p:0027 s:0018 b:0018 l:000017 d:000017 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:18 c:0003 p:0264 s:0012 b:0012 l:0004c8 d:0024c8 EVAL crawler.rb:39 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:0004c8 d:0004c8 TOP

-- Ruby level backtrace information ---------------------------------------- crawler.rb:39:in <main>' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:18:incrawl' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:90:in crawl' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:90:innew' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:83:in initialize' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:92:inblock in crawl' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:163:in run' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:163:inloop' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/core.rb:168:in block in run' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:84:indiscard_doc!' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:60:in links' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/anemone-0.7.2/lib/anemone/page.rb:77:indoc' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html.rb:15:in HTML' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html/document.rb:122:inparse' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/html/document.rb:189:in detect_encoding' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/xml/sax/push_parser.rb:47:inwrite' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/nokogiri-1.5.6/lib/nokogiri/xml/sax/push_parser.rb:47:in `native_write'

-- C level backtrace information -------------------------------------------

See Crash Report log file under ~/Library/Logs/CrashReporter or /Library/Logs/CrashReporter, for the more detail of.

-- Other runtime information -----------------------------------------------

[NOTE] You may have encountered a bug in the Ruby interpreter or extension libraries. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html

Abort trap: 6

Any thoughts?

ender672 commented 11 years ago

This looks very similar to #845 which was fixed in the master branch by 287a6ca. This fix isn't in a released version of nokogiri yet. Can you test the failing script against the master branch?

lucasgertel commented 11 years ago

How to link to it after clone? Tkz!

lucasgertel commented 11 years ago

rake install -c tmp/x86_64-darwin12.2.0/nokogiri/1.9.3/nokogiri.bundle lib/nokogiri/nokogiri.bundle /Users/lgertel/.rvm/rubies/ruby-1.9.3-p374/bin/ruby -w -I.:lib:bin:test:. -e 'require "rubygems"; require "minitest/autorun"; require "test/css/test_nthiness.rb"; require "test/css/test_parser.rb"; require "test/css/test_tokenizer.rb"; require "test/css/test_xpath_visitor.rb"; require "test/decorators/test_slop.rb"; require "test/html/sax/test_parser.rb"; require "test/html/sax/test_parser_context.rb"; require "test/html/test_builder.rb"; require "test/html/test_document.rb"; require "test/html/test_document_encoding.rb"; require "test/html/test_document_fragment.rb"; require "test/html/test_element_description.rb"; require "test/html/test_named_characters.rb"; require "test/html/test_node.rb"; require "test/html/test_node_encoding.rb"; require "test/test_convert_xpath.rb"; require "test/test_css_cache.rb"; require "test/test_encoding_handler.rb"; require "test/test_memory_leak.rb"; require "test/test_nokogiri.rb"; require "test/test_reader.rb"; require "test/test_soap4r_sax.rb"; require "test/test_xslt_transforms.rb"; require "test/xml/node/test_save_options.rb"; require "test/xml/node/test_subclass.rb"; require "test/xml/sax/test_parser.rb"; require "test/xml/sax/test_parser_context.rb"; require "test/xml/sax/test_push_parser.rb"; require "test/xml/test_attr.rb"; require "test/xml/test_attribute_decl.rb"; require "test/xml/test_builder.rb"; require "test/xml/test_c14n.rb"; require "test/xml/test_cdata.rb"; require "test/xml/test_comment.rb"; require "test/xml/test_document.rb"; require "test/xml/test_document_encoding.rb"; require "test/xml/test_document_fragment.rb"; require "test/xml/test_dtd.rb"; require "test/xml/test_dtd_encoding.rb"; require "test/xml/test_element_content.rb"; require "test/xml/test_element_decl.rb"; require "test/xml/test_entity_decl.rb"; require "test/xml/test_entity_reference.rb"; require "test/xml/test_namespace.rb"; require "test/xml/test_node.rb"; require "test/xml/test_node_attributes.rb"; require "test/xml/test_node_encoding.rb"; require "test/xml/test_node_inheritance.rb"; require "test/xml/test_node_reparenting.rb"; require "test/xml/test_node_set.rb"; require "test/xml/test_parse_options.rb"; require "test/xml/test_processing_instruction.rb"; require "test/xml/test_reader_encoding.rb"; require "test/xml/test_relax_ng.rb"; require "test/xml/test_schema.rb"; require "test/xml/test_syntax_error.rb"; require "test/xml/test_text.rb"; require "test/xml/test_unparented_node.rb"; require "test/xml/test_xinclude.rb"; require "test/xml/test_xpath.rb"; require "test/xslt/test_custom_functions.rb"; require "test/xslt/test_exception_handling.rb"' -- /Users/lgertel/.rvm/gems/ruby-1.9.3-p374@global/gems/bundler-1.2.3/lib/bundler/definition.rb:233: warning: assigned but unused variable - e /Users/lgertel/.rvm/gems/ruby-1.9.3-p374@global/gems/bundler-1.2.3/lib/bundler/source.rb:516: warning: method redefined; discarding old revision /Users/lgertel/walmart/nokogiri/test/helper.rb:11: version info: {"warnings"=>[], "nokogiri"=>"1.5.6", "ruby"=>{"version"=>"1.9.3", "platform"=>"x86_64-darwin12.2.0", "description"=>"ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]", "engine"=>"ruby"}, "libxml"=>{"binding"=>"extension", "compiled"=>"2.9.0", "loaded"=>"2.9.0"}} Run options: --seed 53969

Running tests:

*****E***E****S*******_E_S**_E_EE*E*****S***/Users/lgertel/walmart/nokogiri/lib/nokogiri/html/document.rb:124: [BUG] Segmentation fault ruby 1.9.3p374 (2013-01-15 revision 38858) [x86_64-darwin12.2.0]

-- Control frame information ----------------------------------------------- c:0025 p:---- s:0114 b:0114 l:000113 d:000113 CFUNC :read_memory c:0024 p:0373 s:0107 b:0107 l:000106 d:000106 METHOD /Users/lgertel/walmart/nokogiri/lib/nokogiri/html/document.rb:124 c:0023 p:0050 s:0099 b:0099 l:000098 d:000098 METHOD /Users/lgertel/walmart/nokogiri/lib/nokogiri/html.rb:15 c:0022 p:0070 s:0091 b:0091 l:000090 d:000090 METHOD /Users/lgertel/walmart/nokogiri/lib/nokogiri.rb:71 c:0021 p:0071 s:0083 b:0083 l:000082 d:000082 METHOD /Users/lgertel/walmart/nokogiri/lib/nokogiri.rb:126 c:0020 p:0040 s:0077 b:0077 l:000076 d:000076 METHOD /Users/lgertel/walmart/nokogiri/test/test_convert_xpath.rb:7 c:0019 p:0046 s:0074 b:0074 l:0005b0 d:0005b0 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:941 c:0018 p:0090 s:0068 b:0068 l:000056 d:000067 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:781 c:0017 p:---- s:0062 b:0062 l:000061 d:000061 FINISH c:0016 p:---- s:0060 b:0060 l:000059 d:000059 CFUNC :map c:0015 p:0124 s:0057 b:0057 l:000056 d:000056 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:774 c:0014 p:0015 s:0049 b:0049 l:000040 d:000048 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:764 c:0013 p:---- s:0046 b:0046 l:000045 d:000045 FINISH c:0012 p:---- s:0044 b:0044 l:000043 d:000043 CFUNC :map c:0011 p:0012 s:0041 b:0041 l:000040 d:000040 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:764 c:0010 p:0189 s:0036 b:0036 l:000035 d:000035 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:740 c:0009 p:0013 s:0026 b:0026 l:000025 d:000025 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:903 c:0008 p:0012 s:0023 b:0023 l:000014 d:000022 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:890 c:0007 p:---- s:0020 b:0020 l:000019 d:000019 FINISH c:0006 p:---- s:0018 b:0018 l:000017 d:000017 CFUNC :each c:0005 p:0068 s:0015 b:0015 l:000014 d:000014 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:889 c:0004 p:0029 s:0011 b:0011 l:000010 d:000010 METHOD /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:878 c:0003 p:0057 s:0007 b:0007 l:000fa8 d:002450 BLOCK /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:658 c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH c:0001 p:0000 s:0002 b:0002 l:000968 d:000968 TOP

-- Ruby level backtrace information ---------------------------------------- /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:658:in block in autorun' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:878:inrun' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:889:in _run' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:889:ineach' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:890:in block in _run' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:903:inrun_tests' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:740:in _run_anything' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:764:in_run_suites' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:764:in map' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:764:inblock in _run_suites' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:774:in _run_suite' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:774:inmap' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:781:in block in _run_suite' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/minitest-2.2.2/lib/minitest/unit.rb:941:inrun' /Users/lgertel/walmart/nokogiri/test/test_convert_xpath.rb:7:in setup' /Users/lgertel/walmart/nokogiri/lib/nokogiri.rb:126:inNokogiri' /Users/lgertel/walmart/nokogiri/lib/nokogiri.rb:71:in parse' /Users/lgertel/walmart/nokogiri/lib/nokogiri/html.rb:15:inHTML' /Users/lgertel/walmart/nokogiri/lib/nokogiri/html/document.rb:124:in parse' /Users/lgertel/walmart/nokogiri/lib/nokogiri/html/document.rb:124:inread_memory'

-- C level backtrace information -------------------------------------------

See Crash Report log file under ~/Library/Logs/CrashReporter or /Library/Logs/CrashReporter, for the more detail of.

-- Other runtime information -----------------------------------------------

[NOTE] You may have encountered a bug in the Ruby interpreter or extension libraries. Bug reports are welcome. For details: http://www.ruby-lang.org/bugreport.html

rake aborted! Command failed with status (): [/Users/lgertel/.rvm/rubies/ruby-1.9.3-p374...] /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/gems/hoe-2.16.1/lib/hoe/test.rb:75:in block in define_test_tasks' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/bin/ruby_noexec_wrapper:14:ineval' /Users/lgertel/.rvm/gems/ruby-1.9.3-p374/bin/ruby_noexec_wrapper:14:in `

' Tasks: TOP => default => test (See full trace by running task with --trace)

lucasgertel commented 11 years ago

Fixed! Generated the gem and installed. Ty Vm

Flaburgan commented 10 years ago

I have the exact same bug in 1.6.2.1 in diaspora, see stacktrace at https://github.com/diaspora/diaspora/issues/4989#issuecomment-45982787

I guess the fix is already merged, so there still is a problem. Will try to find what xml was malformed.

Flaburgan commented 10 years ago

Please find a more complete description of the problem here: https://github.com/diaspora/diaspora/issues/4996