Closed fenelon closed 10 years ago
This is a most likely GC bug. Disabling GC makes it work.
I've tried different versions of ruby, libxml and nokogiri. No luck there.
I'm expiriencing the same problem. The issue seems not to be related to Sidekiq and Celluloid: I tried migrating my whole app to Resque, and the problem remained. Tried different versions of ruby, libxml, nokogiri. Nothing helps.
If it's random, it might be related to this: http://bugs.ruby-lang.org/issues/8100
Try Ruby 2.0 patched with changeset 39919
(Also reported to affect 1.9.3 I believe)
Latest ruby versions do not solve this issue, I'm still having segfaults.
@Wardrop @sthetz The ruby2.1.0-dev
solves the segfault issue on a development mac, but still fails on our Ubuntu server.
Nokogiri::XML::XPath::SyntaxError: NULL context pointer
I just got this instead of a segfault.
@tenderlove @flavorjones @sparklemotion
I finally figured out what the problem was. I have a reproduceable test for segfault: https://github.com/sthetz/nokogiri-segfault
Thank you for isolating the issue!
I was able to pare it down to this snippet:
require 'nokogiri'
require 'libxml'
loop do
threads = []
20.times do
threads << Thread.new do
d = Nokogiri::XML '<foo><bar></bar></foo>'
(d/'bar').each{}
end
end
threads.each { |thread| thread.join }
end
The issue is that libxml-ruby's initializer hooks into libxml2 in a way that is incompatible with Nokogiri.
In libxml-ruby's ext/libxml/ruby_xml_node.c: https://github.com/xml4r/libxml-ruby/blob/921527901ee68a50292998ad57a03e34e2e81b20/ext/libxml/ruby_xml_node.c#L43-L56
The problem is that this hook is global, and even libxml2 nodes that are managed by Nokogiri end up passing through the rxml_node_deregisterNode() function. This function is only meant to handle libxml-ruby nodes, and it results in memory corruption.
I'm not sure yet what the solution is, and I really have to move the pigs onto new pasture right now.
I'll check back on this tonight. Thanks again for taking the time to isolate the issue.
CC @cfis
Interesting. libxml-ruby depends on that callback for memory management, so it can't go away. Perhaps though the callback could do a type check on the ruby object in _private and if its not a libxmlruby object just ignore it?
Nokogiri stores a VALUE in the _private field for nodes just like libxml-ruby does. My comment about memory corruption was because I mistakenly thought that Nokogiri was putting a custom struct in there.
So I don't see anything obviously wrong, even when rxml_node_deregisterNode() gets called on a node wrapped by Nokogiri.
A type check is likely to make this issue go away, but I have a feeling there is something else going on. I'm having trouble narrowing it down though -- when I make changes to the the test snippet that should be unrelated I can no longer reproduce it. Seems to be timing or GC related (but GC.stress doesn't produce it either).
Still fiddling with this.
@ender672 We were able to get rid of the segfault by disabling GC completely, so yes, it is GC related.
@sthetz Thanks for pointing that out. Usually GC.stress is good at triggering GC issues, but it doesn't help here.
I usually narrow the crashing/leaking snippet down until it only calls one Nokogiri method and debug it from there, but this one goes away when I try that.
Here is what I have so far:
The method that I've seen this happen in so far is new() in xml_xpath.c: https://github.com/sparklemotion/nokogiri/blob/f897a2ec7f7cc0f79fecc30261bc2c508d74a91c/ext/nokogiri/xml_xpath_context.c#L275-L296
The parameter nodeobj is the already-freed node.
I gotta go pick up restaurant food scraps. Will be back later.
I'm narrowing down on the issue. Quick update:
A very similar issue came up four years ago.
The solution at the time was to avoid the libxml-ruby callback by temporarily disabling the node-deregister-callback.
However, that solution was never really complete -- there are many other places in Nokogiri where libxml2 nodes are deregistered.
In 2011, libxml-ruby enabled the callback for native OS threads (for ruby 1.9.x compatibility). The Nokogiri workaround doesn't work in this case, and @sthetz's snippet uses multithreading to trigger the four year old bug again.
I have a hunch that the libxml-ruby callback is just triggering a deeper Nokogiri issue. Still digging.
I thought the libxml-ruby callback would be benign, but turns out that VALUE pointers are unreliable when used in the free() function of a ruby-wrapped C struct. By the time the free() function is invoked the VALUE pointer may have been recycled.
Here is what triggers the error:
In order for the libxml-ruby callback to be safe, Nokogiri will have to make sure that every libxml2 node has its _private field unset before we call xmlFree().
Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829 for one example reason why).
I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem.
It does happen in 2.8 as well On Apr 23, 2013 10:04 PM, "Mike Dalessio" notifications@github.com wrote:
Does this happen with libxml < 2.9.0? If not, then I think we should close this, as Nokogiri doesn't support 2.9.0 yet (see #829https://github.com/sparklemotion/nokogiri/issues/829for one example reason why).
I'm also not really comfortable hacking Nokogiri to work around libxml-ruby. It's an old and apparently-unsupported gem.
— Reply to this email directly or view it on GitHubhttps://github.com/sparklemotion/nokogiri/issues/881#issuecomment-16875547 .
@flavorjones - It's a problem with GC timing. I won't be adding any hacks to work around this. All attention has been on identifying & understanding the problem.
libxml-ruby is still active and it's included in enough code that I absolutely want to fix this issue. This kind of thing spawns lots of segfault bug reports.
libxml-ruby - not old and not unsupported (i'm the maintainer).
@fenelon - are you loading libxml-ruby? A good place to check is your Gemfile.lock.
So @flavorjones was right and the only way I can think of fixing this was to add yet another workaround in Nokogiri.
How bout a haiku?
It may be a hack
But it is a pretty one
Said the pragmatist
There is an alternative -- we could raise an exception if libxml-ruby is detected, spit out a warning, allow an environment variable to override, etc.
I lean more towards the commit in the pull request. Nice thing about it is that the cleanup only happens when the libxml-ruby callback is detected.
Any chance of this issue getting fixed any time soon?
issue still present w/ ruby 2.1 & 2.0
Will be in 1.6.3.rc1 to be released later today.
still an issue?
/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/lib/nokogiri/xml/sax/push_parser.rb:47: [BUG] Segmentation fault at 0x00000000000000
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]
Do we need to set up Nokogiri to use different versions of libxml2 by changing the paths when we install?
# Nokogiri (1.6.3.rc1)
---
warnings: []
nokogiri: 1.6.3.rc1
ruby:
version: 2.1.2
platform: x86_64-darwin12.0
description: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin12.0]
engine: ruby
libxml:
binding: extension
source: packaged
libxml2_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxml2/2.8.0"
libxslt_path: "/Users/typeoneerror/.rvm/gems/ruby-2.1.2@doki/gems/nokogiri-1.6.3.rc1/ports/x86_64-apple-darwin12.5.0/libxslt/1.1.28"
compiled: 2.8.0
loaded: 2.8.0
We are still seeing what looks very much like this same issue with nokogiri 1.6.6.2
and libxml-ruby 2.8.0
, a minimal repro is here: https://gist.github.com/codekitchen/2715ddc89e782b3e6c6f
Removing libxml-ruby, even though the repro isn't using it, prevents the crash.
We've decided to just remove libxml-ruby from our app, since we are only using it in a couple small areas and removing it will be very low effort, but I still wanted to report it.
Hi,
I'm having this kind of crash, too. It really seems to be an interaction between your 2 libs.
Here is the trace
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:75:in `find'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `context'
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/libxml-ruby-2.7.0/lib/libxml/node.rb:58:in `new'
-- Machine register context ------------------------------------------------
RIP: 0x00000000105003fd RBP: 0x00000000193d39d0 RSP: 0x00000000193d39b0
RAX: 0x636e6174736e692d RBX: 0x00000000063bdbb0 RCX: 0x00000000055282e0
RDX: 0x000000001c565a30 RDI: 0x00000000144438b0 RSI: 0x0000000000000001
R8: 0x0000000000000000 R9: 0x0000000010646c22 R10: 0x0000000000000000
R11: 0x0000000011ba4fa0 R12: 0x00000000063bc000 R13: 0x00000000063bcf30
R14: 0x000000000639bce0 R15: 0x00000000193d3be8 EFL: 0x0000000000000004
-- C level backtrace information -------------------------------------------
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_bugreport+0x4ea) [0x501d30a] vm_dump.c:693
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_bug_context+0xcb) [0x4eb30ab] error.c:425
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(sigsegv+0x3e) [0x4f9145e] signal.c:879
/lib/x86_64-linux-gnu/libpthread.so.0 [0x531f8d0]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x87) [0x105003fd]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeProp+0xb8) [0x104fe019]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreePropList+0x2f) [0x104fdf50]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x148) [0x105004be]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeNodeList+0x107) [0x1050047d]
/home/mloiseleur/.rvm/gems/ruby-2.2.1/gems/nokogiri-1.6.6.2/lib/nokogiri/nokogiri.so(xmlFreeDoc+0x161) [0x104fc75f]
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(finalize_list+0x51) [0x4ed1021] gc.c:2463
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(gc_finalize_deferred+0x50) [0x4ed2550] gc.c:2500
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_postponed_job_flush+0x133) [0x5024563] vm_trace.c:1572
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_threadptr_execute_interrupts.part.41+0x139) [0x502a7f9] thread.c:1971
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x52e) [0x501046e] vm_eval.c:252
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_class_new_instance+0x21) [0x4f1f281] object.c:1856
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_yield+0x492) [0x50190f2] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_ary_each+0x52) [0x4e63e22] array.c:1803
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_iterate+0xea) [0x5005c0a] vm_eval.c:1129
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_block_call+0x2b) [0x5005dcb] vm_eval.c:1198
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(enum_to_a+0x38) [0x4ea7968] enum.c:503
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(proc_call+0x52) [0x4ec2452] proc.c:731
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x124d) [0x5009c4d] insns.def:1054
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call0_body.constprop.78+0x1ce) [0x501010e] vm_eval.c:180
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_call0+0x192) [0x5011562] vm_eval.c:59
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(send_internal+0xd2) [0x5016fc2] vm_eval.c:928
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_cfunc+0x127) [0x5003827] vm_insnhelper.c:1382
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_call_method+0x11e) [0x501652e] vm_insnhelper.c:1691
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec_core+0x1197) [0x5009b97] insns.def:1024
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_exec+0x78) [0x500e3d8] vm.c:1400
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(invoke_block_from_c+0x6be) [0x501462e] vm.c:813
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(vm_invoke_proc+0xe0) [0x50147f0] vm.c:878
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_vm_invoke_proc+0x18) [0x50148d8] vm.c:897
/home/mloiseleur/.rvm/rubies/ruby-2.2.1/lib/libruby.so.2.2(rb_fiber_start+0x110) [0x5031c70] cont.c:1263
@Coren - Please open a new issue. This issue has been closed for almost a year and a half. Your problem is unlikely to be the same root cause, so let's track it as a new problem.
Thanks!
sure, see #1364
This mostly happens in concurrent environments. I'm using
nokogiri
withsidekiq
, androadie
(which also uses nokogiri).nokogiri -v
Backtrace: