orc-lang / orc

Orc programming language implementation
https://orc.csres.utexas.edu/
BSD 3-Clause "New" or "Revised" License
40 stars 3 forks source link

Intermittent failures in threadring test #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The threadring.orc program will occasionally fail (~ 25% of the time), either 
producing only the first 
of its two outputs (498), or no output at all, and then continuing to run 
silently without halting. 

Note: It has never produced incorrect outputs.

This may indicate a bug in the underlying implementation, which is causing the 
execution to fail 
silently. This failure seems to have begun occurring after the migration to OIL 
execution, though I 
have not verified this.

Original issue reported on code.google.com by dkitc...@gmail.com on 22 Feb 2010 at 9:31

GoogleCodeExporter commented 9 years ago

Original comment by dkitc...@gmail.com on 22 Feb 2010 at 9:33

GoogleCodeExporter commented 9 years ago
This problem was introduced in r1514, with the introduction of Kilim 0.6 and 
slight modifications to our own 
Kilim glue code. If we are unable to correct it directly, the best strategy may 
be to roll back to our previous, 
customized Kilim version.

Original comment by dkitc...@gmail.com on 26 Feb 2010 at 9:04

GoogleCodeExporter commented 9 years ago
Internal testing and discussion has led us to conclude that the test may be 
failing because the workload is too 
heavy, and we do not have any throttling in the engine (throttling is a 
research topic all on its own). However, we 
are not yet certain that overload is always the source of the bug; there may 
still be an underlying Kilim issue.

Since this is a confirmed Kilim issue, I am handing it off to Amin.

Original comment by dkitc...@gmail.com on 2 Mar 2010 at 9:28

GoogleCodeExporter commented 9 years ago
Kilim 0.6 has serious memory issues when creating a lot of tasks.
Here is a log of things that I did.
First I tried running the threadring with the original new kilim 0.6 (without my
changes) to make sure it not because of my changes. It blows up.
Then I tried running the threadring with the kilim 0.58. It works ok.
I tried running one of the examples in original kilim 0.6. It is called 
LotsOfTasks.
This example simply create a lot of task. The default behavior is that it 
creates
100000 tasks in 10 rounds and between each round it sleep for 1000 and call
System.gc(). It works OK in this situation. But I started to play with that a 
little.
I commented these lines:
//            System.gc();
//            Thread.sleep(1000); // give the GC a chance.

Guess what? It blew up! This is very similar situation with our threadring 
benchmark.
So now I am sure it is kilim 0.6 problem.
Amazingly enough, the same LotsOfTasks example from kilim 0.6 works awesomely 
with
kilim 0.58. I ran it with 1000000 tasks without gc and thread.sleep. It ran in a
blink of eyes! 
I will safely go back 0.58. And We'll investigate the problem with kilim 0.6 
offline.

Original comment by amsh...@gmail.com on 4 Mar 2010 at 12:02