scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
230 stars 21 forks source link

Structural method call may not be thread-safe #1006

Closed scabug closed 13 years ago

scabug commented 16 years ago

Every structural call point generates bytecode of the following sort:

public static Method reflMethod$$Method1(Class x$$1) {
  if(reflMethod$$Cache1 == null || reflClass$$Cache1 != x$$1) {
    reflMethod$$Cache1 = x$$1.getMethod("getName", new Classr0);
    reflClass$$Cache1 = x$$1;
  }
  return reflMethod$$Cache1;
}

along with the two static fields it touches (the method and class references). Shouldnt this method be synchronized or at least the return value be the method that was found and not the static field?

scabug commented 16 years ago

Imported From: https://issues.scala-lang.org/browse/SI-1006?orig=1 Reporter: @dubochet

scabug commented 15 years ago

@odersky said: Milestone next_bugfix deleted

scabug commented 14 years ago

David Pollak (dpp) said: I just found this bug while I was preparing my preso for the JVM language summit.

Seems to me that something this critical should be fixed in less than 15 months.

Btw, here's code that triggers the bug:

package jvm_summit

object Structural { def getLen(in: {def length: Int}): Int = in.length

def main(in: Array[String]) { for (i <- 1 to 500) { new Thread(new Runnable() { def run() { try { while(true) { getLen("Hello") getLen(Array(1,2,3)) getLen(Frog) getLen(Frog2) getLen(Frog3) getLen(Frog4) getLen(Frog5) getLen(Frog6) getLen(Frog7) } } catch { case e => e.printStackTrace System.exit(-1) } } }).start() } } }

object Frog { def length = 4 }

object Frog2 { def length = 4 }

object Frog3 { def length = 4 }

object Frog4 { def length = 4 }

object Frog5 { def length = 4 }

object Frog6 { def length = 4 }

object Frog7 { def length = 4 }

scabug commented 14 years ago

@dubochet said: The compilation scheme given in the bug report corresponds to the mono-cache implementation, which is no longer used (it can still be forced using the -Ystruct-dispatch flag, but will probably be removed soon as it is slower in all cases). The current standard compilation scheme compiles a call as follows:

 var reflParams$$Cache: Array[Class[_]] = Array[JClass](classOf[A], classOf[B])
 var reflPoly$$Cache: scala.runtime.MethodCache = new EmptyMethodCache()
def reflMethod$$Method(forReceiver: JClass[_]): JMethod = {
  var method: JMethod = reflPoly$$Cache.find(forReceiver)
  if (method != null)
    return method
  else {
    method = forReceiver.getMethod("xyz", reflParams$$Cache)
    reflPoly$$Cache = reflPoly$$Cache.add(forReceiver, method)
    return method
  }
}

My analysis is that this method can run unsynchronized in a multi-threaded environment. In the worst case, a method looked-up by one thread will not be cached (it could be overwritten by another thread). However, the system still is safe as the method can always be recalculated afterwards (it is only a cache), and the caching data structure is not harmed by the threads interacting.

I cannot observe any problem in the example provided by dpp using the latest trunk. I ran it a few times on an 8-core machine, including once for almost one hour, without any exception happening. Maybe I am looking at the wrong place. dpp, if you think I closed the bug wrongfully, can you be more specific as to how to reproduce the bug (including the hardware you use, the observed behaviour and the expected behaviour).

scabug commented 14 years ago

David Pollak (dpp) said: private static scala.runtime.MethodCache reflPoly$$Cache1; is not marked as volitile nor are you using java.util.concurrent.atomic.AtomicReference so the reference update is not guaranteed to be thread safe nor is it guaranteed to be seen by multiple threads except after the entry/exit of a monitor.

I would strongly recommend storing the MethodCache is an AtomicReference.

Sorry for being a stickler here, but this one is important to get right because bad byte-code cannot be worked around.

scabug commented 14 years ago

@dubochet said: David, thank you for your interest in this issue. What you say concerning reflPoly$$Cache is true. However, my analysis is that the overall thread-safety of the cache is still maintained despite the update method being not deterministic under concurrent access. I think that the poly-cache implementation of structural methods dispatch only allows the following two abnormal behaviours:

  1. A method is looked-up by one thread, but its entry in the cache is lost when another thread overwrites it. The cache remains valid because the method can be looked-up again next time (and entered into the cache then).

  2. Two threads add the same method to the cache. The cache remains valid because the first duplicated method will always subsequently be found. The second duplicated method is wasted cache space, but is not otherwise a problem.

Both of these abnormal behaviours create performance penalties: for the first case, a single unnecessary method lookup, and for the second case, an additional equality test and pointer dereference for some subsequent cache lookups. However, such a performance penalty will rarely happen, even on very concurrent code, as it cannot happen anymore once the cache for a call site is built. The other option is to always incur the cost of locking, atomic compare-and-write or volatile fields.

In all fairness, since volatile reads are cheap on modern JVMs, it may be beneficial to performance to write a polymorphic cache that does not allow any abnormal behaviour. To be sure requires benchmarking, which is very tricky for such a problem. I'll put this on my todo list, but the priority is low.

In conclusion, I still believe that the 2.8 poly-cache implementation of structural methods dispatch is thread-safe and fast. If you disagree, can you please provide me with a use case that demonstrates the problem, or explain to me very precisely where in the code you see the problem.

Please note also that the 2.7 branch does not contain poly-cache structural method dispatch and may not be thread safe. I had always assumed that poly-cache was part of the 2.7 branch, but apparently, I was mistaken. I suppose that makes for yet another reason to be excited about 2.8: poly-cache is as fast or faster than the 2.7 mono-cache implementation, and can be a lot faster (5x) in some realistic cases.

scabug commented 14 years ago

David Pollak (dpp) said: Without a synchronization or an AtomicReference, you are not guaranteed that the reference that's written has been initialized because of out of order execution of the code. Thus if another thread reads the reference, it may be reading a reference to an uninitialized object. I just verified this with Brian Goetz.

Given that this bug was open for 15 months and the other significant issues with the compiler (lack of thread safety for object instantiation) and the ongoing issues with the Actor library, I would greatly appreciate it if EPFL would take a more cooperative approach with the community, or at the very least not dismiss my bug reports.

So, please mark the cache field as volatile so that the code is threadsafe.

scabug commented 14 years ago

@dubochet said: Ok, I finally understand the problem. I will fix it.

Out of curiosity, is the example above a theoretical case that may trigger the bug, or did it actually crash on Scala 2.8?

scabug commented 14 years ago

@gaydenko said: Real

java.lang.NoSuchMethodException: [I.length()

with r19246.

scabug commented 14 years ago

@dubochet said: Fixed in r19444 for all three dispatching method (no-cache, mono-cache, and poly-cache).

Please note: I looked at the generated bytecode, and it seems correct to me. I am, however, no expert in weird multithreaded behaviours of the JVM. And I cannot reproduce the bug. It would be helpful if someone more expert than me can confirm that the generated bytecode is now OK with respect that this bug.

scabug commented 14 years ago

@gaydenko said: Sorry, still reproducable (I mean David's Structural test case) with r19445.

scabug commented 14 years ago

@dubochet said: Anli, the problem you mention is in fact a symptom of the latest version of issue #451 (which breaks in new and improved ways following the recent changes in the compilation of Arrays). If you remove the call to getLen(Array(1,2,3)) from David's example, you will find that NoSuchMethodException: [I.length() does not occur, even before the fix in r19444.

As for the thread-safety issue, I still think is fixed. If there is someone who did reliably encounter it on some given hardware, I'd be thankful for him to test it and confirm that the bug is fixed. Otherwise, an expert in concurrent Java (I am not) should have a detailed look at the generated bytecode.

scabug commented 14 years ago

@gaydenko said: Aha, I see, thanks. I can not reproduce NSME with that getLen(Array(1,2,3) call removed.