lxc / ruby-lxc

ruby bindings for liblxc
https://linuxcontainers.org/lxc
GNU Lesser General Public License v2.1
133 stars 29 forks source link

Cannot fork after attaching in ruby >= 2.6 #44

Open JimScadden opened 4 years ago

JimScadden commented 4 years ago

Example code:

require 'lxc'

old_sync = $stdout.sync
$stdout.sync = true

ct = LXC::Container.new('container')
puts "#{Process.pid} Attaching to container"
exitcode = ct.attach({wait: true}) do
  puts "#{Process.pid} Inside container. Forking"
  fork do
    puts "#{Process.pid} Forked :)"
  end
end

This used to work fine in ruby 2.5:

# ruby --version
ruby 2.5.8p224 (2020-03-31 revision 67882) [x86_64-linux]
# ruby test.rb
138201 Attaching to container
26532 Inside container. Forking
26533 Forked :)

However it seems to trigger an internal ruby error (at https://github.com/ruby/ruby/blob/510df47f5f7f83918d3aa00316c8a5b959d80d7c/thread_pthread.c#L1695) in ruby 2.6 / 2.7:

# ruby --version
ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-linux]
# ruby test.rb 2>&1 | head
138686 Attaching to container 
26536 Inside container. Forking
test.rb:10: [BUG] timer_posix was not dead: 0

ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0021 e:000020 CFUNC  :fork
c:0004 p:0035 s:0017 e:000016 BLOCK  test.rb:10 [FINISH]
c:0003 p:---- s:0014 e:000013 CFUNC  :attach
# ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]
# ruby test.rb 2>&1 | head
138889 Attaching to container
26538 Inside container. Forking
test.rb:10: [BUG] timer_posix was not dead: 0

ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0021 e:000020 CFUNC  :fork
c:0004 p:0033 s:0017 e:000016 BLOCK  test.rb:10 [FINISH]
c:0003 p:---- s:0014 e:000013 CFUNC  :attach

I've tested in CentOS 7.7, Debian buster (and sid)

sitano commented 2 years ago

nice catch. I did not check and look but I would blindly suppose it's because lxc_spawn incorrectly forks a Ruby VM. Considering invalid previous state of the timer thread meaning it was not shutdown properly before. Ruby VM Process.fork has some mechanics beyond the clone() syscall that cleans up schedulers and threads data. In 2.4 it shuts down the timer before the fork.

sitano commented 2 years ago

just by replacing fork+clone3(CLONE_PARENT) with the simple call to rb_fork_ruby everything suddenly starts working. however, it requires a patch to both lxc and ruby-lxc. The downside is that the parent process either loses child or it requires to use CHILD_REAPER flag instead of the second call to clone. Anyway. The prototype works.