lbehnke / h2database

Automatically exported from code.google.com/p/h2database
0 stars 0 forks source link

StackOverflow when checking for deadlock #61

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Unfortunately, I can't tell, because it happens when many threads are
running queries simultaneously, and I could not identify the cause yet (in
fact, there is a bug in the application, and I was relying on H2 traces to
help debug it, but due to this stack overflow, some JDBC logging statements
are not shown).

> What is the expected output? What do you see instead?

Should detect a deadlock (and probably show some JDBC calls that are
missing), but throws an exception instead:

/*SQL */SET AUTOCOMMIT TRUE;
02-11 08:54:49 18618 [main] ERROR java.lang.StackOverflowError
    at java.util.HashMap$KeyIterator.<init>(HashMap.java:875)
    at java.util.HashMap$KeyIterator.<init>(HashMap.java:875)
    at java.util.HashMap.newKeyIterator(HashMap.java:889)
    at java.util.HashMap$KeySet.iterator(HashMap.java:921)
    at java.util.HashSet.iterator(HashSet.java:154)
    at org.h2.table.TableData.checkDeadlock(TableData.java:495)
    at org.h2.table.TableData.checkDeadlock(TableData.java:513)
   << same line repeated hundred of times >>
Caused by: org.h2.jdbc.JdbcBatchUpdateException: Deadlock detected. The
current transaction was rolled back. Details: 
Session #10 (user: DIRECTOR) is waiting to lock PUBLIC.GGS_OBJECTS while
locking PUBLIC.MPS_HISTORY (exclusive), PUBLIC.MPS (exclusive).
Session #6 (user: DIRECTOR) is waiting to lock PUBLIC.MPS while locking
PUBLIC.GGS_OBJECTS (exclusive). [40001-107]

Notice also that the deadlock exception mentions session 10 and 6, but
there are no previous jdbc trace calls to session 10 (only to 6). I can
provide a bigger log later, if necessary.

> What version of the product are you using? On what operating system, file
system, and virtual machine?

It happens both with 1.0.78 and 1.1.107. The operating system is Linux 64
bits (Ubuntu 8.10), and the JVM is Sun's 1.5.0_15 64 bits.

> Do you know a workaround?

Not yet. I will try to step-debug through the code.

> How important/urgent is the problem for you?

It's not urgent, because the real problem is on our application, and it has
to be fixed anyway. But it's important, as I will have to use other means
to debug it (like switching the database - I'm using Hibernate).

> In your view, is this a defect or a feature request?

It's a defect - regardless of the applications faulty deadlock, it should
not cause a stack overflow.

Original issue reported on code.google.com by felip...@gmail.com on 11 Feb 2009 at 4:58

GoogleCodeExporter commented 9 years ago
Hi,

The stack overflow is not nice, however without test case it will be hard to 
find the
reason. But after the stack overflow there is a deadlock detected:

Session #10 (user: DIRECTOR) is waiting to lock PUBLIC.GGS_OBJECTS while
locking PUBLIC.MPS_HISTORY (exclusive), PUBLIC.MPS (exclusive).
Session #6 (user: DIRECTOR) is waiting to lock PUBLIC.MPS while locking
PUBLIC.GGS_OBJECTS (exclusive). [40001-107]

Is this the wrong deadlock?

Original comment by thomas.t...@gmail.com on 11 Feb 2009 at 7:12

GoogleCodeExporter commented 9 years ago
Yes, the right deadlock is detected, I am just concerning that the stack 
overflow
makes it harder to debug. 

Anyways, after debugging it on Eclipse, it looks that the stack overflow is 
caused by
this piece of code:

if (error == null && lockExclusive != null) {
  Table t = lockExclusive.getWaitForLock();
  if (t != null) {
    error = t.checkDeadlock(lockExclusive, clash);
    if (error != null) {
      error.add(session);
    }
}

"this" refers to a table called MPS and t is table GGS_OBJECTS; when
t.checkDeadlock() is called, "this" is GGS_OBJECTS and t is MPS, and so on...

Original comment by felip...@gmail.com on 11 Feb 2009 at 7:42

GoogleCodeExporter commented 9 years ago
Here is another update, now including the session lock information

1.call is made to OBJ.checkDeadLock(5, 6) (OBJ is table, 5 and 6 are sessions)

2.in that method, this.lockExclusive == MPS, and MPS.getWaitForLock() == 8, 
which
causes call to:

3.MPS.checkDeadlock(8, 6)

4.in that method, this.lockExclusive == OBJS, and OBJS.getWaitForLock() == 5, 
which
causes call to 1

So, it looks like checking only for session == clash is not enough to find a 
circle.

Original comment by felip...@gmail.com on 11 Feb 2009 at 8:21

GoogleCodeExporter commented 9 years ago
Ok, found a solution: change the clash parameter to be a Set.

If I change it to: 

   public ObjectArray checkDeadlock(Session session, Set clash) {
        // only one deadlock check at any given time
        synchronized (TableData.class) {
            if (clash == null) {
                // verification is started
                clash = new HashSet();
            } else if (clash.contains(session)) {
                // we found a circle
                return new ObjectArray();
            }
            clash.add(session);

Then I don't get the stack overflow anymore.

If you think this solution is fine (the only drawback would be the increased
performance, as it would be creating Set instances), I could try to add a test 
case
and patch.

Original comment by felip...@gmail.com on 11 Feb 2009 at 9:19

GoogleCodeExporter commented 9 years ago
Ok, here is a patch, test case included (if you run it without the change, you 
will
get a StackOverflow).
I also included a trace call before a lock is acquired (I think it can help 
detecting
dead locks, but feel free to remove such call).

Original comment by felip...@gmail.com on 11 Feb 2009 at 10:10

Attachments:

GoogleCodeExporter commented 9 years ago
BTW, the 'deadlick' was a typo, it wasn't intentional. But it looks like I can't
change the title once it was submitted...

Original comment by felip...@gmail.com on 12 Feb 2009 at 5:35

GoogleCodeExporter commented 9 years ago

Original comment by thomas.t...@gmail.com on 12 Feb 2009 at 6:22

GoogleCodeExporter commented 9 years ago
Hi Thomas,

Were you able to reproduce this problem? What about the proposed fix?

Sorry for pushing these questions, but I'm currently using the patched H2 and 
was
wondering if such change would eventually make it.

-- Felipe

Original comment by felip...@gmail.com on 17 Feb 2009 at 8:55

GoogleCodeExporter commented 9 years ago
Hi,

Sorry for the delay. I'm still trying to create a test case for this problem.
Once I have that I will fix it.

Original comment by thomas.t...@gmail.com on 3 Apr 2009 at 12:50

GoogleCodeExporter commented 9 years ago
Hi Thomas,

I already created a test case, it's available in the patch I provided
(http://h2database.googlecode.com/issues/attachment?aid=5336640199476198141&name
=H2-Issue61.patch).

-- Felipe

Original comment by felip...@gmail.com on 3 Apr 2009 at 2:15

GoogleCodeExporter commented 9 years ago
I'm sorry... I will test it.

Original comment by thomas.t...@gmail.com on 3 Apr 2009 at 4:00

GoogleCodeExporter commented 9 years ago
Hi,

I'm sorry for the delay! I understand the problem now, and why it was not 
working. I
just write it down here so I can look it up later if required:

Circular locks were detected (C1 locks A, C2 locks B, C3 locks C). However not 
this:
C1 and C2 are deadlocked (which is detected later but too late), C3 tries to 
lock a
table of C1 or C2. In that case, the deadlock check of C3 is an endless 
recursion.
The test case is (all exclusive locks, + successful, ? tries)

C1 C2 C3
========
A+  
B+ C+
      B?
   A?
C?

The only problem with your test case is that sometimes two deadlocks are 
detected (C1
and C3 I believe). That's normal in this test case, so I have to change 
checkDeadlock
to allow 2 exception in that case.

Anyway, I will apply your patch, it will be included in the next release.

Thanks a lot for your help and your patience!

Original comment by thomas.t...@gmail.com on 4 Apr 2009 at 7:32

GoogleCodeExporter commented 9 years ago
I found a problem with your patch. As I wrote sometimes two deadlocks are 
detected.
This shouldn't be the case. The solution is to pass both the visited (HashSet) 
and
the clash (Session). Do not recurse when already visited, but only throw when 
part of
the circle.

Original comment by thomas.t...@gmail.com on 5 Apr 2009 at 9:36

GoogleCodeExporter commented 9 years ago
Hi Thomas,

Please feel free to improve it; unfortunately, I won't be able to help you 
these days
(for lack of time :-(.

-- Felipe

Original comment by felip...@gmail.com on 5 Apr 2009 at 7:32

GoogleCodeExporter commented 9 years ago
Fixed in version 1.1.111 (2009-04-10)

Original comment by thomas.t...@gmail.com on 10 Apr 2009 at 8:23