kashipai / h2database

H2 Database for reference.
0 stars 0 forks source link

MvStore huge disk usage, MvStore db file should be 240 mb but produce 8,4 gigabytes bdd #542

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

With H2 1.3.175 (2014-01-18) I found a bug with mvstore mode

using this uri 
jdbc:h2:/home/user/mvstore/bigdb;MV_STORE=TRUE;MVCC=TRUE

Here how to reproduce:

drop table if exists mytable;
create table mytable as select 
REPEAT(hash('SHA256',STRINGTOUTF8((rand()+X)::varchar),1),2) field from 
SYSTEM_RANGE(1, 400000);
-- DB size is now 119 mo
create table bigtable as select * from MYTABLE union  select * from MYTABLE;
-- DB size is now superior than 8,4 gigabytes

I'm under OpenSuse, with a SSD.

Original issue reported on code.google.com by nico.de...@gmail.com on 22 Jan 2014 at 4:40

GoogleCodeExporter commented 9 years ago
My experience is the same.. we fill the database programatically from our 
primary data storage, which is SVN.

The PageStore DB was 27.9 GB big, while the MVStore uses more than 300 GB. 
Moreover, when shutting down the system and doing full compact of the database, 
the DB finally grew to more than 470 GB which resulted in occupying all the 
available disk space.

We used H2 initially our own compiled version from sources of 2014-06-12 and 
this week we switched to 1.4.179. Tests were done on CentOS 6.4 x64.

I also did local tests with another repository on my Windows 7 x64 machine and 
this time the Page store was 414 MB (stayed 414 MB after compacting), while 
MVStore used about 1GB after filling the DB and grew to 1.9 GB after compacting.

Original comment by radek.kr...@gmail.com on 24 Jun 2014 at 8:57

GoogleCodeExporter commented 9 years ago
How did you compact, did you use "shutdown compact" or "shutdown defrag"? 
Currently, with version 1.4.179, the statement "shutdown defrag" should reduce 
the database file size. If it doesn't possibly the problem is LOB data 
(repeated binaries). How large is the file if you compress the database (using 
zip)? Could you send me the (compressed) database file?

Original comment by thomas.t...@gmail.com on 24 Jun 2014 at 9:05

GoogleCodeExporter commented 9 years ago
We use "shutdown compact" command to reduce the DB size. I'm gonna do some 
further tests using "shutdown defrag" later this week.

The compressed 300 GB file was about 41 GB (which is still much moree than the 
PageStore).

With the other repository I used locally, the compacted MVStore was 352 MB (1.9 
GB originally). Just out of curiosity, I compacted also the PageStore resulting 
in 34MB archive (414MB originally). Both repositories are unfortunately 
production ones, so I cannot share those. But I'll try to reproduce the 
problems on some test data and provide some test DB later.

Original comment by radek.kr...@gmail.com on 24 Jun 2014 at 9:22

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[Sorry, I originally wrote "shutdown compact" below, now fixed; and another 
typo fixed now]

OK. The file size issue is a high priority for me. Only "shutdown defrag" will 
*really* reduce the file size currently (as of 1.4.179), and "shutdown compact" 
will not help much (only about 16 MB at a time). Could you tell me:

- how large is the database file before running "shutdown defrag",
    both real size and compressed size (with zip compression)
- how long did it take to run "shutdown defrag"
- how large is the database file after running "shutdown defrag", 
  both real size and compressed size (with zip compression)

This data alone will help quite a lot to set priorities.

Original comment by thomas.t...@gmail.com on 24 Jun 2014 at 9:32

GoogleCodeExporter commented 9 years ago
Another issue I need to work on is to not let the database file get that big of 
course. But performance and effectiveness of closing a database, "shutdown 
compact" and "shutdown defrag" are also important.

Original comment by thomas.t...@gmail.com on 24 Jun 2014 at 9:34

GoogleCodeExporter commented 9 years ago
OK. Did some more tests and measurements on my local machine(Windows 7 x64, 
Core i7, 8GB RAM, SDD) with the smaller repository. It seems that the defrag 
works fine, just that the space is not reused properly during runtime.

MVStore before defrag is 1.11 GB (199 MB zipped).
MVStore after defrag is 339 MB (49 MB zipped).

Shutdown of our whole platform took about 15s including the DB defrag, so my 
guess would be that the defrag did not take more than 10 seconds.

Original comment by radek.kr...@gmail.com on 24 Jun 2014 at 1:27

GoogleCodeExporter commented 9 years ago
> It seems that the defrag works fine, just that the space is not reused 
properly during runtime.

Yes. For version 1.4.179, I mainly worked on "shutdown defrag". I think it's 
now relatively fast (I hope to make it twice as fast, but that's not urgent I 
guess).

More important right now is that disk space is re-used earlier while running 
the process. I do like to keep the 45 seconds delay to re-used disk space, 
(even thought, that might need to be changed at some point as well), to ensure 
a database can't get corrupt after a power failure. But I think the problem is 
that chunks with usage of less than about 10% should be re-used a lot earlier. 
There seem to be far too many chunks with 1-5% usage during normal operation.

> MVStore after defrag is 339 MB (49 MB zipped).

That might mean binaries are stored multiple times. Do you store a lot of 
(similar) binaries?

Original comment by thomas.t...@gmail.com on 24 Jun 2014 at 1:41

GoogleCodeExporter commented 9 years ago
My guess is that the high compression ratio is caused by the number of 
redundant varchars in our DB, e.g. our biggest table in production repository 
has some 13 mio records and contains 8 columns that store URIs of objects which 
are typically quite similar and 70-80 characters long.

btw. These numbers "339 MB (49 MB zipped)" are similar to PageStore on the same 
repository.

At the moment I'm trying to defrag the production database on a test system, so 
will let you know these results later.

Original comment by radek.kr...@gmail.com on 25 Jun 2014 at 7:51

GoogleCodeExporter commented 9 years ago
Here are the measurements from my test system with production data (CentOS 6.4 
x64, Core i7 920, 12GB RAM, SW RAID). These are even more interesting.

MVStore before defrag is 302.4 GB (39.7 GB zipped).
MVStore after defrag is 20.6 GB (2.3 GB zipped).

DB defrag took 257 minutes. The HDDs on the server are quite slow and there's 
not much RAM left for OS caches, so this defrag performance can be expected.

Original comment by radek.kr...@gmail.com on 25 Jun 2014 at 12:12

GoogleCodeExporter commented 9 years ago
New measurements with version 1.4.180.

MVStore before defrag is 128.24 GB (16.47 GB zipped).
MVStore after defrag is 17.43 GB (1.44 GB zipped).

DB defrag of this database took 37 minutes.

Original comment by radek.kr...@gmail.com on 23 Jul 2014 at 1:25

GoogleCodeExporter commented 9 years ago
I think this is fixed now (as of today). Could you test with the very latest 
trunk version of H2? This is http://www.h2database.com/automated/h2-latest.jar 
- or build H2 yourself from the latest source code.

Original comment by thomas.t...@gmail.com on 23 Jul 2014 at 6:59

GoogleCodeExporter commented 9 years ago
Please note the latest (snapshot) version is not yet stable. I'm working on 
that.

Original comment by thomas.t...@gmail.com on 24 Jul 2014 at 10:18

GoogleCodeExporter commented 9 years ago
It seems the issue is not yet fixed: I tried with today's "h2-latest.jar", 
running a 60MB database create script:

java -cp .;h2-latest.jar org.h2.tools.RunScript -url 
jdbc:h2:./oldFormat;MV_STORE=FALSE;MVCC=FALSE -script setup-database.sql
... resulted in a 41MB file oldFormat.h2.db

java -cp .;h2-latest.jar org.h2.tools.RunScript -url jdbc:h2:./newFormat 
-script setup-database.sql
... resulted in a 317MB file newFormat.mv.db

Should I upload the setup-database.sql file (6MB 7zipped)?

Original comment by u.g.diet...@gmail.com on 21 Sep 2014 at 6:00

GoogleCodeExporter commented 9 years ago
> Should I upload the setup-database.sql file (6MB 7zipped)?

Yes, that would be nice. If uploading fails, send it to me via email.

Original comment by thomas.t...@gmail.com on 21 Sep 2014 at 7:17

GoogleCodeExporter commented 9 years ago
Attached the database setup file for the example above, run on Windows 7 / Java 
1.7.0_67.

Original comment by u.g.diet...@gmail.com on 21 Sep 2014 at 10:24

Attachments:

GoogleCodeExporter commented 9 years ago
Any news here? We run into the same problem.

Inserting 1 mio entries resulted in a db file with a size of 2.6GB.
Exporting that table into a CSV (using CSVWriter) resulted in a file with about 
40MB.

Any ideas how to solve that issue? Is this related?

Original comment by js.cedar...@gmail.com on 2 Dec 2014 at 3:16

GoogleCodeExporter commented 9 years ago
I'm still working on this issue. One problem is indexes. If you create an index 
before adding data, then more disk space is used. Could you check if creating 
the indexes later on saves space? Of course it shouldn't matter much when 
indexes are created, and I hope in the near future I will be able to solve this 
problem. But right now, this is a possible workaround.

Original comment by thomas.t...@gmail.com on 2 Dec 2014 at 3:27

GoogleCodeExporter commented 9 years ago
We played around. One thing that works for us: We disabled the MVStore...

Original comment by js.cedar...@gmail.com on 3 Dec 2014 at 11:06

GoogleCodeExporter commented 9 years ago
As spatial index has been solved. We are waiting the next release of H2 to give 
new feedback about MVStore.

Original comment by nico.de...@gmail.com on 3 Dec 2014 at 4:08

GoogleCodeExporter commented 9 years ago
I consider this problem fixed now with version 1.4.184, even thought there is 
still room for improvement.

Original comment by thomas.t...@gmail.com on 14 Jan 2015 at 6:51

GoogleCodeExporter commented 9 years ago
Re-checked with h2-1.4.185 and the database setup file from above:
* old format (with ;MV_STORE=FALSE;MVCC=FALSE) size is 40MB (as before)
* new format size is 86MB (much better, yet still not quite satisfactory)
* new format, create indexes *after* inserting the data, size is 45MB (that is 
good)
* new format, create indexes, then insert data, then drop indexes and re-create 
them, size is 115MB (that's ugly)
* new format, create indexes, then insert data, then drop indexes and re-create 
them, the call SHUTDOWN COMPACT: size is 141MB (that's worse)

The bottom line is, I'll stick to the old format.

Original comment by u.g.diet...@gmail.com on 9 Feb 2015 at 7:47

GoogleCodeExporter commented 9 years ago
Hi

The problem might be the high retention time. How fast is your test case? The 
default retention time is 45 seconds, that means space is only re-used if it 
was unused for 45 seconds. The retention time can be changed, see 
http://h2database.com/html/grammar.html#set_retention_time

If this is not the problem, could you post a test case please?

Original comment by thomas.t...@gmail.com on 12 Feb 2015 at 6:52