orc-lang / orc

Orc programming language implementation
https://orc.csres.utexas.edu/
BSD 3-Clause "New" or "Revised" License
41 stars 3 forks source link

The interpreter crashes with an assert failure just occasionally #131

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This may be more than one bug, or it may be a single race; at the moment I 
cannot tell. The crashes seem to be in the general form of asserts failing due 
to tokens being in the wrong state. Often involving halting or killing.

I am creating this issue to collect information on the subject in hopes that we 
can eventually find a way to reproduce this bug well enough to fix it. The 
comments will contain traces of crashes. If you have something that may be 
related feel free to add it.

Version and Platform:
Version 2.1.0 and development there after. This is hard enough to reproduce 
that we have not found a pattern. Similar crashes have occurred on both Linux 
and OS X.

Original issue reported on code.google.com by arthur.peters on 28 Sep 2013 at 7:04

GoogleCodeExporter commented 9 years ago
==== Starting test_data/performance/threadring.orc ====
498
Sep 28, 2013 12:47:44 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Only live tokens may be blocked: state=Killed
    at orc.run.core.Token.blockOn(Token.scala:191)
    at orc.run.core.Join.join(Join.scala:68)
    at orc.run.core.Token.makeCall(Token.scala:440)
    at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
    at orc.run.core.Token$$anonfun$eval$2.apply(Token.scala:518)
    at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:31)
    at orc.run.core.Resolver$$anonfun$resolve$1.apply(Resolver.scala:30)
    at orc.run.core.Resolver$class.resolveOptional(Resolver.scala:47)
    at orc.run.core.Token.resolveOptional(Token.scala:54)
    at orc.run.core.Resolver$class.resolve(Resolver.scala:30)
    at orc.run.core.Token.resolve(Token.scala:54)
    at orc.run.core.Token.eval(Token.scala:518)
    at orc.run.core.Token.run(Token.scala:486)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Only live tokens may be blocked: state=Killed

Original comment by arthur.peters on 28 Sep 2013 at 7:05

GoogleCodeExporter commented 9 years ago
I may have had an insight, but I don't know how to test my theory. Many 
references to state have the form: state match { ... }. This appears to read 
state only once however the pseudo code of what is generated for this is as 
follows:

val s = state
if(s matches ...) {
  ...
} else if(s matches ...) {v
  ...
}

This seems fine, but if the JIT compiler optimizes this it may change it to the 
following because the read is neither synchronized nor volatile (I couldn't 
tell for sure if this is allowed, but it's a common way to handle register 
spills):

if(state matches ...) {
  ...
} else if(state matches ...) {v
  ...
}

This has a race in it because different checks could see different values. If 
the checks where reordered this could be a particular issue, for instance if 
Killed is checked first then Live. The token might be live when Killed is 
checked and then Killed when live is checked and finally fail with an exception 
(see code in blockOn). This reordering does not appear the JVM bytecode (I 
checked) but the JIT compiler could still be doing it.

This may all be crazy, but I thought I would document it before I move on to 
other work.

Original comment by arthur.peters on 28 Sep 2013 at 7:51

GoogleCodeExporter commented 9 years ago
==== Starting ../OrcExamples/beginning_Orc/auction.orc ====
Current bid: 1
Current bid: 2
Current bid: 3
Current bid: 4
Current bid: 5
Current bid: 6
Sep 29, 2013 11:48:03 PM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle. 
orc.run.core.SiteCallHandle@2abaf2ce.state=CallReturnedValue(true)
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Spurious check of call handle. 
orc.run.core.SiteCallHandle@2abaf2ce.state=CallReturnedValue(true)

Note that the state shown should not have caused this exception. There is a 
race in the message generation, but I think this might be a hint.

branches/PetersOrcMachine/OrcScala/src/orc/run/core/CallHandle.scala @ r3313

Original comment by arthur.peters on 30 Sep 2013 at 7:54

GoogleCodeExporter commented 9 years ago
It just happened again:

Jan 25, 2014 4:25:58 AM orc.run.core.Execution$$anonfun$2 apply
SEVERE: Unexpected Java Error thrown; killing Orc Execution
java.lang.AssertionError: Spurious check of call handle. 
orc.run.core.SiteCallHandle@5e37429a.state=CallWasKilled
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Error: java.lang.AssertionError: Spurious check of call handle. 
orc.run.core.SiteCallHandle@5e37429a.state=CallWasKilled
Exception in thread "main" java.lang.AssertionError: Spurious check of call 
handle. orc.run.core.SiteCallHandle@5e37429a.state=CallWasKilled
    at orc.run.core.CallHandle.check(CallHandle.scala:92)
    at orc.run.core.Token.run(Token.scala:469)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

Occurred will running Sudoku.orc running under the Token interpreter with -O3 
at r3338 in the Porc branch.

I still don't know how to solve this. But I thought I would store this for 
future debugging.

I do not see any way this could happen except by having the token scheduled to 
run before it should in such a way that there is a race between the token 
starting to execute and the killed handle state being set.

Original comment by arthur.peters on 25 Jan 2014 at 6:21

jthywiss commented 7 years ago

I think this was fixed incidentally in the meantime. OK to close?

arthurp commented 7 years ago

It's possible, but I have seen this error in I would say the last 6 months (I should have captured the trace and added it here, but I didn't). So I'm not sure it's fixed. Just really unlikely.