sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.16k stars 904 forks source link

node_set.rb:24: [BUG] Segmentation fault #881

Closed fenelon closed 10 years ago

fenelon commented 11 years ago

This mostly happens in concurrent environments. I'm using nokogiri with sidekiq, and roadie (which also uses nokogiri).

nokogiri -v

# Nokogiri (1.5.6)
    ---
    warnings: []
    nokogiri: 1.5.6
    ruby:
      version: 1.9.3
      platform: x86_64-linux
      description: ruby 1.9.3p385 (2013-02-06) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.9.0
      loaded: 2.9.0

Backtrace:

/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24: [BUG] Segmentation fault
ruby 1.9.3p385 (2013-02-06) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0055 p:---- s:0236 b:0236 l:000235 d:000235 CFUNC  :[]
c:0054 p:0022 s:0232 b:0232 l:000231 d:000231 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24
c:0053 p:0019 s:0227 b:0227 l:000226 d:000226 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node.rb:248
c:0052 p:0160 s:0223 b:0223 l:000222 d:000222 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:193
c:0051 p:0654 s:0217 b:0217 l:000216 d:000216 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:161
c:0050 p:0012 s:0203 b:0203 l:002208 d:000202 BLOCK  /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:93
c:0049 p:0015 s:0199 b:0199 l:000189 d:000198 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:239
c:0048 p:---- s:0196 b:0196 l:000195 d:000195 FINISH
c:0047 p:---- s:0194 b:0194 l:000193 d:000193 CFUNC  :upto
c:0046 p:0023 s:0190 b:0190 l:000189 d:000189 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238
c:0045 p:0133 s:0186 b:0186 l:002208 d:001360 BLOCK  /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:92
c:0044 p:---- s:0181 b:0181 l:000180 d:000180 FINISH
c:0043 p:---- s:0179 b:0179 l:000178 d:000178 CFUNC  :each
c:0042 p:0018 s:0176 b:0176 l:002208 d:002208 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85
c:0041 p:0013 s:0171 b:0171 l:000170 d:000170 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/pegast.rb:39
c:0040 p:0049 s:0167 b:0167 l:000166 d:000166 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:43
c:0039 p:0252 s:0161 b:0161 l:000160 d:000160 METHOD /home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:68
c:0038 p:0187 s:0155 b:0155 l:000154 d:000154 METHOD /home/deployer/apps/leveltravel/current/app/actors/operator_search_request_base.rb:25
c:0037 p:0035 s:0147 b:0147 l:000650 d:000146 BLOCK  /home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:50
c:0036 p:---- s:0143 b:0143 l:000142 d:000142 FINISH
c:0035 p:---- s:0141 b:0141 l:000140 d:000140 CFUNC  :each
c:0034 p:0023 s:0138 b:0138 l:000137 d:000137 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6
c:0033 p:0164 s:0133 b:0133 l:000650 d:000650 METHOD /home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:47
c:0032 p:0025 s:0121 b:0121 l:001480 d:000120 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:49
c:0031 p:---- s:0119 b:0119 l:000118 d:000118 FINISH
c:0030 p:---- s:0117 b:0117 l:000116 d:000116 CFUNC  :call
c:0029 p:0024 s:0114 b:0114 l:001ba8 d:000113 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109
c:0028 p:0005 s:0112 b:0112 l:001e30 d:000111 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:11
c:0027 p:0111 s:0110 b:0110 l:001b80 d:001b80 METHOD /usr/local/lib/ruby/1.9.1/timeout.rb:68
c:0026 p:0061 s:0098 b:0098 l:001e30 d:001e30 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:10
c:0025 p:0049 s:0092 b:0092 l:001ba8 d:000091 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0024 p:0007 s:0090 b:0090 l:000089 d:000089 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/active_record.rb:6
c:0023 p:0049 s:0086 b:0086 l:001ba8 d:000085 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0022 p:0007 s:0084 b:0084 l:000083 d:000083 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/retry_jobs.rb:50
c:0021 p:0049 s:0072 b:0072 l:001ba8 d:000071 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0020 p:0041 s:0070 b:0070 l:0025c0 d:001ca0 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:11
c:0019 p:0037 s:0067 b:0067 l:000066 d:000066 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/logging.rb:22
c:0018 p:0045 s:0063 b:0063 l:0025c0 d:0025c0 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:7
c:0017 p:0049 s:0057 b:0057 l:001ba8 d:000056 LAMBDA /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111
c:0016 p:---- s:0055 b:0055 l:000054 d:000054 FINISH
c:0015 p:---- s:0053 b:0053 l:000052 d:000052 CFUNC  :call
c:0014 p:0040 s:0050 b:0050 l:001ba8 d:001ba8 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114
c:0013 p:0030 s:0043 b:0043 l:001480 d:002668 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:48
c:0012 p:0021 s:0041 b:0041 l:000a20 d:000a20 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:87
c:0011 p:0110 s:0035 b:0035 l:001480 d:001528 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:47
c:0010 p:---- s:0029 b:0029 l:000028 d:000028 FINISH
c:0009 p:---- s:0027 b:0027 l:000026 d:000026 CFUNC  :call
c:0008 p:---- s:0025 b:0025 l:000024 d:000024 CFUNC  :public_send
c:0007 p:0021 s:0021 b:0021 l:000020 d:000020 METHOD /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23
c:0006 p:0016 s:0015 b:0015 l:001358 d:000014 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/future.rb:18
c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC  :call
c:0003 p:0021 s:0008 b:0008 l:001198 d:000007 BLOCK  /home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP   

-- Ruby level backtrace information ----------------------------------------
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48:in `block in create'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/internal_pool.rb:48:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/future.rb:18:in `block in initialize'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `dispatch'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `public_send'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/celluloid-0.12.4/lib/celluloid/calls.rb:23:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:47:in `block in process'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:87:in `stats'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:48:in `block (2 levels) in process'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114:in `invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:114:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:7:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/logging.rb:22:in `with_context'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/logging.rb:11:in `block in call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/retry_jobs.rb:50:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/active_record.rb:6:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:111:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:10:in `call'
/usr/local/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/server/timeout.rb:11:in `block in call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109:in `block in invoke'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/middleware/chain.rb:109:in `call'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/sidekiq-2.9.0/lib/sidekiq/processor.rb:49:in `block (3 levels) in process'
/home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:47:in `perform'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6:in `each'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/activerecord-3.2.11/lib/active_record/relation/delegation.rb:6:in `each'
/home/deployer/apps/leveltravel/current/app/workers/stats_updater.rb:50:in `block in perform'
/home/deployer/apps/leveltravel/current/app/actors/operator_search_request_base.rb:25:in `run'
/home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:68:in `perform'
/home/deployer/apps/leveltravel/current/app/operator_apis/abstract_operator.rb:43:in `process_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/pegast.rb:39:in `parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85:in `samo_parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:85:in `each'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:92:in `block in samo_parse_operator_response'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `each'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `upto'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:239:in `block in each'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:93:in `block (2 levels) in samo_parse_operator_response'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:161:in `parse_offer'
/home/deployer/apps/leveltravel/current/app/operator_apis/modules/samo_operator.rb:193:in `parse_prices'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node.rb:248:in `at_css'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24:in `first'
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:24:in `[]'

-- C level backtrace information -------------------------------------------
sidekiq 2.9.0 current [1 of 10 busy]() [0x52c3e0]
sidekiq 2.9.0 current [1 of 10 busy]() [0x57cc43]
sidekiq 2.9.0 current [1 of 10 busy](rb_bug+0xb0) [0x57dfe0]
sidekiq 2.9.0 current [1 of 10 busy]() [0x4b9b82]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fd095a5acb0] xml_node.c:169
/home/deployer/apps/leveltravel/shared/bundle/ruby/1.9.1/gems/nokogiri-1.5.9/lib/nokogiri/nokogiri.so(+0x112de) [0x7fd090b532de] xml_node_set.c:319
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy]() [0x44ab7e]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy](rb_ary_each+0x52) [0x5423c2]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy](rb_yield+0x47) [0x5299e7]
sidekiq 2.9.0 current [1 of 10 busy](rb_ary_each+0x52) [0x5423c2]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523e8b]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x528244]
sidekiq 2.9.0 current [1 of 10 busy]() [0x51eb0c]
sidekiq 2.9.0 current [1 of 10 busy]() [0x522d8f]
sidekiq 2.9.0 current [1 of 10 busy]() [0x523aa1]
sidekiq 2.9.0 current [1 of 10 busy]() [0x532932]
sidekiq 2.9.0 current [1 of 10 busy]() [0x532bec]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fd095a52e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fd094e3ecbd]
fenelon commented 11 years ago

This is a most likely GC bug. Disabling GC makes it work.

I've tried different versions of ruby, libxml and nokogiri. No luck there.

sthetz commented 11 years ago

I'm expiriencing the same problem. The issue seems not to be related to Sidekiq and Celluloid: I tried migrating my whole app to Resque, and the problem remained. Tried different versions of ruby, libxml, nokogiri. Nothing helps.

Wardrop commented 11 years ago

If it's random, it might be related to this: http://bugs.ruby-lang.org/issues/8100

Try Ruby 2.0 patched with changeset 39919

(Also reported to affect 1.9.3 I believe)

sthetz commented 11 years ago

Latest ruby versions do not solve this issue, I'm still having segfaults.

fenelon commented 11 years ago

@Wardrop @sthetz The ruby2.1.0-dev solves the segfault issue on a development mac, but still fails on our Ubuntu server.

sthetz commented 11 years ago

Nokogiri::XML::XPath::SyntaxError: NULL context pointer

I just got this instead of a segfault.

sthetz commented 11 years ago

@tenderlove @flavorjones @sparklemotion

I finally figured out what the problem was. I have a reproduceable test for segfault: https://github.com/sthetz/nokogiri-segfault

ender672 commented 11 years ago

Thank you for isolating the issue!

I was able to pare it down to this snippet:

require 'nokogiri'
require 'libxml'

loop do
  threads = []
  20.times do
    threads << Thread.new do
      d = Nokogiri::XML '<foo><bar></bar></foo>'
      (d/'bar').each{}
    end
  end

  threads.each { |thread| thread.join }
end

The issue is that libxml-ruby's initializer hooks into libxml2 in a way that is incompatible with Nokogiri.

In libxml-ruby's ext/libxml/ruby_xml_node.c: https://github.com/xml4r/libxml-ruby/blob/921527901ee68a50292998ad57a03e34e2e81b20/ext/libxml/ruby_xml_node.c#L43-L56

The problem is that this hook is global, and even libxml2 nodes that are managed by Nokogiri end up passing through the rxml_node_deregisterNode() function. This function is only meant to handle libxml-ruby nodes, and it results in memory corruption.

I'm not sure yet what the solution is, and I really have to move the pigs onto new pasture right now.

I'll check back on this tonight. Thanks again for taking the time to isolate the issue.

ender672 commented 11 years ago

CC @cfis

cfis commented 11 years ago

Interesting. libxml-ruby depends on that callback for memory management, so it can't go away. Perhaps though the callback could do a type check on the ruby object in _private and if its not a libxmlruby object just ignore it?

ender672 commented 11 years ago

Nokogiri stores a VALUE in the _private field for nodes just like libxml-ruby does. My comment about memory corruption was because I mistakenly thought that Nokogiri was putting a custom struct in there.

So I don't see anything obviously wrong, even when rxml_node_deregisterNode() gets called on a node wrapped by Nokogiri.

A type check is likely to make this issue go away, but I have a feeling there is something else going on. I'm having trouble narrowing it down though -- when I make changes to the the test snippet that should be unrelated I can no longer reproduce it. Seems to be timing or GC related (but GC.stress doesn't produce it either).

Still fiddling with this.

sthetz commented 11 years ago

@ender672 We were able to get rid of the segfault by disabling GC completely, so yes, it is GC related.

ender672 commented 11 years ago

@sthetz Thanks for pointing that out. Usually GC.stress is good at triggering GC issues, but it doesn't help here.

I usually narrow the crashing/leaking snippet down until it only calls one Nokogiri method and debug it from there, but this one goes away when I try that.

ender672 commented 11 years ago

Here is what I have so far:

  1. The node deregister callback gets called on a node whose _private member points to a Nokogiri Ruby Node object.
  2. rxml_node_deregisterNode() sets the Node object's data pointer to NULL.
  3. The same Ruby Node object gets passed into a Nokogiri method as an argument.
  4. Nokogiri assumes that the data pointer can't be NULL and things break.

The method that I've seen this happen in so far is new() in xml_xpath.c: https://github.com/sparklemotion/nokogiri/blob/f897a2ec7f7cc0f79fecc30261bc2c508d74a91c/ext/nokogiri/xml_xpath_context.c#L275-L296

The parameter nodeobj is the already-freed node.

I gotta go pick up restaurant food scraps. Will be back later.

ender672 commented 11 years ago

I'm narrowing down on the issue. Quick update:

A very similar issue came up four years ago.

The solution at the time was to avoid the libxml-ruby callback by temporarily disabling the node-deregister-callback.

However, that solution was never really complete -- there are many other places in Nokogiri where libxml2 nodes are deregistered.

In 2011, libxml-ruby enabled the callback for native OS threads (for ruby 1.9.x compatibility). The Nokogiri workaround doesn't work in this case, and @sthetz's snippet uses multithreading to trigger the four year old bug again.

I have a hunch that the libxml-ruby callback is just triggering a deeper Nokogiri issue. Still digging.

ender672 commented 11 years ago

I thought the libxml-ruby callback would be benign, but turns out that VALUE pointers are unreliable when used in the free() function of a ruby-wrapped C struct. By the time the free() function is invoked the VALUE pointer may have been recycled.

Here is what triggers the error:

  1. Nokogiri creates node A.
  2. GC mark phase triggers.
    • Node A is unreachable.
  3. Nokogiri creates node B. Ruby reuses the same VALUE pointer as A.
  4. GC sweep phase triggers.
    • Nokogiri frees the libxml2 node associated with A.
    • The libxml-ruby callback is invoked. However the VALUE pointer in the _private field now points to B and the wrong ruby object is cleared.

In order for the libxml-ruby callback to be safe, Nokogiri will have to make sure that every libxml2 node has its _private field unset before we call xmlFree().

flavorjones commented 11 years ago

Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829 for one example reason why).

I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem.

fenelon commented 11 years ago

It does happen in 2.8 as well On Apr 23, 2013 10:04 PM, "Mike Dalessio" notifications@github.com wrote:

Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829https://github.com/sparklemotion/nokogiri/issues/829for one example reason why).

I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem.

— Reply to this email directly or view it on GitHubhttps://github.com/sparklemotion/nokogiri/issues/881#issuecomment-16875547 .

ender672 commented 11 years ago

@flavorjones - It's a problem with GC timing. I won't be adding any hacks to work around this. All attention has been on identifying & understanding the problem.

libxml-ruby is still active and it's included in enough code that I absolutely want to fix this issue. This kind of thing spawns lots of segfault bug reports.

cfis commented 11 years ago

libxml-ruby - not old and not unsupported (i'm the maintainer).

ender672 commented 11 years ago

@fenelon - are you loading libxml-ruby? A good place to check is your Gemfile.lock.

ender672 commented 11 years ago

So @flavorjones was right and the only way I can think of fixing this was to add yet another workaround in Nokogiri.

How bout a haiku?

It may be a hack
But it is a pretty one
Said the pragmatist
ender672 commented 11 years ago

There is an alternative -- we could raise an exception if libxml-ruby is detected, spit out a warning, allow an environment variable to override, etc.

I lean more towards the commit in the pull request. Nice thing about it is that the cleanup only happens when the libxml-ruby callback is detected.

mlooney commented 10 years ago

Any chance of this issue getting fixed any time soon?

mlooney commented 10 years ago

issue still present w/ ruby 2.1 & 2.0

flavorjones commented 10 years ago

895 was merged. Should fix this issue.

flavorjones commented 10 years ago

Will be in 1.6.3.rc1 to be released later today.

typeoneerror commented 10 years ago

still an issue?

/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/lib/nokogiri/xml/sax/push_parser.rb:47: [BUG] Segmentation fault at 0x00000000000000
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]

Do we need to set up Nokogiri to use different versions of libxml2 by changing the paths when we install?

# Nokogiri (1.6.3.rc1)
    ---
    warnings: []
    nokogiri: 1.6.3.rc1
    ruby:
      version: 2.1.2
      platform: x86_64-darwin12.0
      description: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxml2/2.8.0"
      libxslt_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxslt/1.1.28"
      compiled: 2.8.0
      loaded: 2.8.0
codekitchen commented 9 years ago

We are still seeing what looks very much like this same issue with nokogiri 1.6.6.2 and libxml-ruby 2.8.0, a minimal repro is here: https://gist.github.com/codekitchen/2715ddc89e782b3e6c6f

Removing libxml-ruby, even though the repro isn't using it, prevents the crash.

We've decided to just remove libxml-ruby from our app, since we are only using it in a couple small areas and removing it will be very low effort, but I still wanted to report it.

Coren commented 9 years ago

Hi,

I'm having this kind of crash, too. It really seems to be an interaction between your 2 libs.

Here is the trace

/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:75:in `find'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `context'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `new'

-- Machine register context ------------------------------------------------
 RIP: 0x00000000105003fd RBP: 0x00000000193d39d0 RSP: 0x00000000193d39b0
 RAX: 0x636e6174736e692d RBX: 0x00000000063bdbb0 RCX: 0x00000000055282e0
 RDX: 0x000000001c565a30 RDI: 0x00000000144438b0 RSI: 0x0000000000000001
  R8: 0x0000000000000000  R9: 0x0000000010646c22 R10: 0x0000000000000000
 R11: 0x0000000011ba4fa0 R12: 0x00000000063bc000 R13: 0x00000000063bcf30
 R14: 0x000000000639bce0 R15: 0x00000000193d3be8 EFL: 0x0000000000000004

-- C level backtrace information -------------------------------------------
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_bugreport+0x4ea) [0x501d30a] vm_dump.c:693
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_bug_context+0xcb) [0x4eb30ab] error.c:425
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(sigsegv+0x3e) [0x4f9145e] signal.c:879
/lib/x86_64-linux-gnu/libpthread.so.0 [0x531f8d0]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x87) [0x105003fd]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeProp+0xb8) [0x104fe019]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreePropList+0x2f) [0x104fdf50]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x148) [0x105004be]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeDoc+0x161) [0x104fc75f]
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(finalize_list+0x51) [0x4ed1021] gc.c:2463
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(gc_finalize_deferred+0x50) [0x4ed2550] gc.c:2500
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_postponed_job_flush+0x133) [0x5024563] vm_trace.c:1572
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_threadptr_execute_interrupts.part.41+0x139) [0x502a7f9] thread.c:1971
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x52e) [0x501046e] vm_eval.c:252
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_yield+0x492) [0x50190f2] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_ary_each+0x52) [0x4e63e22] array.c:1803
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_iterate+0xea) [0x5005c0a] vm_eval.c:1129
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_block_call+0x2b) [0x5005dcb] vm_eval.c:1198
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(enum_to_a+0x38) [0x4ea7968] enum.c:503
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(proc_call+0x52) [0x4ec2452] proc.c:731
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(send_internal+0xd2) [0x5016fc2] vm_eval.c:928
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_fiber_start+0x110) [0x5031c70] cont.c:1263
flavorjones commented 9 years ago

@Coren - Please open a new issue. This issue has been closed for almost a year and a half. Your problem is unlikely to be the same root cause, so let's track it as a new problem.

Thanks!

Coren commented 9 years ago

sure, see #1364