jnr / jnr-posix

Java Posix layer
Other
241 stars 91 forks source link

JVM crash with threaded use of Etc::getgrnam #44

Open gavin-scott opened 9 years ago

gavin-scott commented 9 years ago

The JVM crashes on threaded use of Etc::getgrnam This happens on all versions of java 1.7 I tested with, e.g. the latest Oracle java 1.7 update 75. Tested with both jruby 1.7.8 and 1.7.19 on CentOS 6.6 A sample test program that will demonstrate the problem is:

require 'etc'

i = 1
while true do
  puts "Iteration #{i}"
  i = i + 1
  threads = []
  10.times do
    threads << Thread.new do
      Etc::getgrnam('users')
    end
  end
  threads.each do |thr|
    thr.join
  end
end

This always crashes for me, usually it takes less than 200 iterations or so:

[gavin@localhost tmp]# env JAVA_HOME=/opt/java PATH=$JAVA_HOME/bin:$PATH jruby crash.rb
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f585dc4157f, pid=23613, tid=140017224009472
#
# JRE version: Java(TM) SE Runtime Environment (7.0_76-b13) (build 1.7.0_76-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.76-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x13357f]  __strlen_sse42+0xf
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid23613.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted
headius commented 9 years ago

Ahh nice, so we must have a thread-unsafe C API here? I could not get this to fail at all on OS X, so perhaps it's a thread unsafety issue in the Linux implementation of getgrnam.

headius commented 8 years ago

Simple fix would be to lock known-unsafe kernel APIs. Any concerns?

gavin-scott commented 8 years ago

I think using the thread-safe variant getgrnam_r would make more sense. AFAICT getgrnam happens to be thread-safe on Mac OS X but not on e.g. linux.

headius commented 8 years ago

@gavin-scott so I guess we'd want to try getgrnam_r and failover to getgrnam, possibly with synchronization? We need to maintain a lowest common denominator too.

gavin-scott commented 8 years ago

Why would you fall back? I think getgrnam_r should be supported everywhere. Do you mean if getgrnam_r fails for some reason? I would think raising an exception would be ok at this point, but honestly I don't know what the general pattern for handling low-level C library failures is in this code.

headius commented 8 years ago

Well if we can count on getgrnam_r working everywhere, then I agree there's no reason not to use it for getgrnam.

Generally C library errors should just be propagated out. C binding problems we try to hide and fallback on alternatives as much as we can.

gavin-scott commented 8 years ago

AFAIK getgrnam_r is part of POSIX and should be available everywhere. I'm having trouble finding a good solid reference for that, but it is listed here: http://www.unix.com/man-page/posix/3p/getgrnam/

headius commented 8 years ago

This was addressed recently in JRuby by locking around all those function calls. The same would probably be appropriate here.

Also, this affects many other "etc" functions. They probably all need thread-safety treatment.

We'd welcome help. It might just be adding "synchronized" to the LibC bindings of these methods, or renaming them to the new thread-safe versions @gavin-scott mentioned.