Double allocation exception - possible causes?

GoogleCodeExporter commented 9 years ago

Hello, Thomas.  

I'm writing with questions arising from the double allocation exception.  

First, some background:

My company has built an Eclipse RCP application that uses H2 to store data
locally on desktop machines.  So, this is a standalone app rather than a
web app, and the database is only accessed by a single process on each
desktop machine where it is installed.  The database files are not on a
network drive.  The machines are running Windows XP and Vista.  The app is
used by about 20 people on a full-time basis.  The schema has about 38
tables.  Data files tend to be about 50 MB when working on a sophisticated
data set.

The app prevents multiple instances from running simultaneously.  It is
multi-threaded, though we have taken steps to ensure that we only access
the database from the main (gui) thread.  We are currently running H2 in
single-threaded mode, as we began development before you added support for
multi-threaded access, and this has been fine for our needs.  When the app
closes, it attempts to gracefully close the connection to the database. 
There is no .trace.db file.

We are using version 111 of the database according to select * from
information_schema.settings where name='CREATE_BUILD'.  We are not using
any special features to my knowledge (LOG=0, two phase commit, linked
tables, cache settings, etc.).  We package the Java Runtime, version
1.6.0_01-b06, with our app, to ensure that all clients use the same version
of Java.

Now to the specifics:

For the second time in about two weeks, we have gotten the double
allocation exception from H2.  Here is an example stack trace:

General error: java.lang.RuntimeException: double allocation in file
C:\path\to\my\data\mydata.data.db page 0 blocks 0-63 [50000-111]
HY000/50000 (Help)
org.h2.jdbc.JdbcSQLException: General error: java.lang.RuntimeException:
double allocation in file C:\path\to\my\data\mydata.data.db page 0 blocks
0-63 [50000-111]
    at org.h2.message.Message.getSQLException(Message.java:107)
    at org.h2.message.Message.convert(Message.java:279)
    at org.h2.engine.Database.openDatabase(Database.java:250)
    at org.h2.engine.Database.<init>(Database.java:218)
    at org.h2.engine.Engine.openSession(Engine.java:57)
    at org.h2.engine.Engine.openSession(Engine.java:139)
    at org.h2.engine.Engine.getSession(Engine.java:119)
    at
org.h2.engine.SessionFactoryEmbedded.createSession(SessionFactoryEmbedded.java:1
7)
    at
org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:248)
    at org.h2.engine.SessionRemote.createSession(SessionRemote.java:226)
    at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:106)
    at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:90)
    at org.h2.Driver.connect(Driver.java:58)
    at org.h2.server.web.WebServer.getConnection(WebServer.java:629)
    at org.h2.server.web.WebThread$1LoginTask.run(WebThread.java:1175)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: double allocation in file
C:\path\to\my\data\mydata.data.db page 0 blocks 0-63
    at org.h2.message.Message.throwInternalError(Message.java:187)
    at org.h2.store.DiskFile.setPageOwner(DiskFile.java:786)
    at org.h2.store.DiskFile.setBlockOwner(DiskFile.java:669)
    at org.h2.store.DiskFile.init(DiskFile.java:417)
    at org.h2.engine.Database.open(Database.java:555)
    at org.h2.engine.Database.openDatabase(Database.java:223)
    ... 13 more 

The first time it happened we were able to recover by simply appending
;RECOVER=1 to the database URL.  But this most recent time that approach
didn't work.  We also tried deleting the index.db file and that didn't work
either.

So, we ran the Recover tool to generate the SQL script.  This raised a
number of other issues that I'll get to in a moment.  But the first
question, and the one that we care the most about, is this:

1. Other than as a result of accidentally attempting to access the database
from multiple threads, is there any other known way to cause this type of
database corruption?

Now, as for the Recover tool, the following issues were raised:

2. We tried running the output through the RunScript tool on a blank
database, and it failed with a syntax error.  There were some lines that
appeared to be intended as comments, but were not correctly commented out.
 After removing these, we ran RunScript again and it got past the syntax
errors.

3. Next, we were getting failures because of various unique key and primary
key violations.  It turns out that the script generated by the Recover tool
had several records of duplicate data in it, with those records either
being completely identical, or similar enough to where their unique keys
were duplicated.  Whether this was an artifact of the already corrupt
database, or if these records were somehow incorrectly "recovered" by the
Recover tool is something we can't determine.  We modified the generated
script to filter out the offending duplicate records in order to get
everything to go.

4. Once all of the data was back in, we were able to open the database
normally, but found that nearly all of the modifications that had been made
to the data in the hours before the crash took place were gone.  These were
changes that had been previously committed.  Whether the app had been
closed and reopened is unknown, but we do know that many transactions had
been committed.  So, we're at a loss as to why the recovered data was so
out-of-date.

While we thought you should know about issues 2, 3, and 4, the real concern
is that we avoid this situation again in the future.  Any insight you could
provide would be much appreciated.

I would be happy to email you our corrupt database files for examination if
you wish.

Thank you,

Steve Fram

Original issue reported on code.google.com by sbf...@gmail.com on 2 Sep 2009 at 4:59

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Hi,

Yes, this looks like a corruption problem. You have already provided a lot
information, thanks. The version you are using should be fine, I'm not aware of 
any
bugs related to such problems that are fixed in later versions. I'm afraid I 
can not
really solve this problem right now because I don't have a reproducible test 
case,
but maybe we can find out more details, and maybe I can give some advice how to 
avoid
such problems.

> 1. Other than as a result of accidentally attempting to access the database
> from multiple threads, is there any other known way to cause this type of
> database corruption?

I got some reports that 'power off' while writing can cause such problems. I 
will try
to reproduce this, I may be even able to automate a test case, but this will 
take
some time. Most likely I will do that using the new 'page store' mechanism. 

> 2. We tried running the output through the RunScript tool on a blank
database, and it failed with a syntax error.

What were those errors exactly? I like to fix this problem.

> 4. ... modifications that had been made... were gone.

This may be a problem of the Recover tool. I will check that if you can send me 
the
database files (all files, including .data.db, .index.db and all .log.db files).

You can find out if the database is corrupted when running SCRIPT TO 
'test.sql'. This
can also be used to create backups (the advantage over file backups and BACKUP 
TO is
that this will detect corruptions).

> The app prevents multiple instances from running simultaneously.

Do you use the default 'database file locking' mechanism in H2? How does the 
database
URL look like?

Original comment by thomas.t...@gmail.com on 4 Sep 2009 at 2:22

GoogleCodeExporter commented 9 years ago

Hello again,

Thank you for your detailed response.  My apologies for the delay in replying 
back.  

I will send you the corrupt database files by email, using the address: 
dbsupport
-at- h2database -dot- com

Below is to provide more information to you and the forum, and to answer your 
questions:

1. reports that 'power off' while writing can cause such problems.

After extensive monitoring of the users of our application, we have concluded 
that
this problem occurs most often, and possibly only, after users terminate the
application from Task Manager when they believe it to be non-responsive.  
Whether
this constitutes 100% of the cases of corruption is not certain, but I believe 
it
likely, especially in light of those reports about power off while writing.

My team has some work to do to ensure that the application never appears
non-responsive in order to reduce this tendency by our users, but it would be 
great
if you were able to write a test case to reproduce the core problem of 
corruption
during power off.  If I'm able to find the time to do so, I'll also try.

2. Syntax errors in Recover tool output.

Using the attached database files with the Recover tool, and then using 
RunScript on
the output, the first such error is "Column count does not match".  This is 
because a
line that appears to be intended as a comment (starting with a "--") is actually
getting interpreted as an SQL statement.  This should be reproducible with the
attached files.

3. Still curious about the duplicate key errors in the output generated by the
Recover tool.  Perhaps this is purely an artifact of the data corruption?

4. By "SCRIPT TO", I assume this is equivalent to the following (taken from the
browser interface):
java -cp h2*.jar org.h2.tools.Script -url "jdbc:h2:~/test" -user "sa" -script
"~/backup.sql"

Running this against the corrupt database yields the same double allocation 
error, so
it never generates the output.  If you mean something else by "SCRIPT TO" I'd 
be very
interested to try that.

5. Regarding the app preventing multiple instances from running simultaneously, 
we
use what is turned on by default with this URL:
jdbc:h2:file:C:\path\to\database

or, from the browser interface when trying to recover:
jdbc:h2:file:C:\path\to\database;RECOVER=1

When using the browser interface, I have a batch file based on the one from the
distribution, modified as follows to allow access to the lucene libraries:

@cd C:\path\to\h2\bin
@java  -Xmx1024m -classpath
"h2.jar;%H2DRIVERS%;%CLASSPATH%;C:\path\to\lucene\lucene-core-2.2.0.jar"
org.h2.tools.Console
@if errorlevel 1 pause

I hope this helps to further investigate.  Thank you again.

Steve Fram

Original comment by sbf...@gmail.com on 16 Sep 2009 at 7:40

GoogleCodeExporter commented 9 years ago

If you have time, could you try with a newer version, using the new page store
storage mechanism? For example version 1.2.123. This mechanism should now be 
stable.
I have already run some 'power failure' tests and could not find problems so 
far, but
I will continue to test it.

Original comment by thomas.t...@gmail.com on 8 Nov 2009 at 1:35

GoogleCodeExporter commented 9 years ago

The newest version should be more stable.
If you still experience problems, please open a new bug.

Original comment by thomas.t...@gmail.com on 15 Jan 2010 at 8:13

Changed state: Fixed

lbehnke / h2database

Double allocation exception - possible causes? #115