lbehnke / h2database

Automatically exported from code.google.com/p/h2database
0 stars 0 forks source link

A select with an IN clause is running forever. #175

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Before submitting a bug, please check the FAQ:
http://www.h2database.com/html/faq.html

What steps will reproduce the problem?
(simple SQL scripts or simple standalone applications are preferred)
1.start with included database
2. Run this query:
select count(*) FROM pcst_zone WHERE PublicID IN ( SELECT T1.PublicID FROM
pcst_zone T1, pcst_zone T2  WHERE T1.Code=T2.Code AND T1.Country=T2.Country
AND T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid)

3. The production query we run is:
DELETE FROM pcst_zone WHERE PublicID IN ( SELECT T1.PublicID FROM pcst_zone
T1, pcst_zone T2  WHERE T1.Code=T2.Code AND T1.Country=T2.Country AND
T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid)

What is the expected output? What do you see instead?
I expect the query to return, but it never does. In a Vm with a small
amount of memory it runs out of heap. In our application it just runs
forever without returning.

I started H2 with these flags:
java -cp h2*.jar -Dh2.OptimizeInList=false -Xms512m -Xmx1024m
-XX:MaxPermSize=128m  org.h2.tools.Server

In the console I ran the select from above.
In talking to developers, I was surprised to find that this is a known
problem among them going back to 1.1

What version of the product are you using? On what operating system, file
system, and virtual machine?
1.2.131
xp pro 64
java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

Do you know a workaround?
none

How important/urgent is the problem for you?
Extremely important. Our demo people run the utility (that causes this
query to be run) frequently, and they have to use either oracle or sql
server to run it. This is a huge hassle and makes some demos very
difficult. We run H2 for this very purpose. 

In your view, is this a defect or a feature request?
Definitely a defect, either in a really bad query plan or the query
processor is in an infinite loop.

Please provide any additional information below.

A big Thank you for looking at this!
Randy

Original issue reported on code.google.com by youn...@gmail.com on 5 Mar 2010 at 11:46

Attachments:

GoogleCodeExporter commented 9 years ago
I don't think it's a bug within the database. It's just that there seems to be 
no
fast way to run the query.

One problem is that you didn't run ANALYZE. I suggest to do that. See also the
documentation.

You use a relatively large cache size (SET CACHE_SIZE 131072). This may be a 
problem
(the system runs out of memory or gets very slow) unless you start with 
-Xmx256m. I
suggest to use a smaller cache size. I used the default cache size, then then 
the
query took around 7.7 seconds.

The the IN(..) query is relatively slow: SELECT count(T1.PublicID) FROM
pcst_zone T1, pcst_zone T2  WHERE T1.Code=T2.Code AND T1.Country=T2.Country
AND T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid; - takes about about 2.5 s 
for
me. It returns 26270 (there are many duplicates). Instead, you could use use:

select count(PublicID) from pcst_zone T1 where exists (select * from pcst_zone 
T2
WHERE T1.Code=T2.Code AND T1.Country=T2.Country
AND T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid)

This also takes about 2.5 seconds, but there are no duplicates (14973). 
However, it
doesn't improve the final result.

The query: select count(*) FROM pcst_zone t3 WHERE t3.PublicID IN (
select PublicID from pcst_zone T1 where exists (select * from pcst_zone T2 WHERE
T1.Code=T2.Code AND T1.Country=T2.Country
AND T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid)); -- takes about 7.3 
seconds for
me. The new query takes around 7.7 seconds:

select count(*) FROM pcst_zone t3 WHERE t3.PublicID IN (select PublicID from
pcst_zone T1 where exists (select * from pcst_zone T2 WHERE T1.Code=T2.Code AND
T1.Country=T2.Country AND T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid));

Original comment by thomas.t...@gmail.com on 7 Mar 2010 at 4:50

GoogleCodeExporter commented 9 years ago
I would be very happy if the query would complete at all, and if it took 7 or 8
seconds would be completely acceptable. It just hangs in our app, and now it's
running out of memory no matter what I do. How are you running it? 
I am doing the following
In a cmd shell I run:
java -cp h2*.jar  org.h2.tools.Server
The console in comes up in firefox and I enter
jdbc:h2:file:/tmp/guidewire/pc
for the jdbc url.

I then enter the query and in about 7-8 seconds it runs out of memory. Notice I 
am
not specifying any memory requirements.

select count(*) FROM pcst_zone WHERE PublicID IN ( SELECT T1.PublicID FROM 
pcst_zone
T1, pcst_zone T2  WHERE T1.Code=T2.Code AND T1.Country=T2.Country AND
T1.ZoneType=T2.ZoneType AND T1.luwid > T2.luwid);
java.lang.OutOfMemoryError: Java heap space

So how are you getting it to work???

Thanks
Randy

Original comment by youn...@gmail.com on 8 Mar 2010 at 7:02

GoogleCodeExporter commented 9 years ago
When I first run the query it also ran out of memory on my system (even when I 
used
-Xmx128m). As I already wrote:

You use a relatively large cache size (SET CACHE_SIZE 131072). This may be a 
problem
(the system runs out of memory or gets very slow) unless you start with 
-Xmx256m. I
suggest to use a smaller cache size. I used the default cache size, then then 
the
query took around 7.7 seconds.

The cache size is persisted in the database file. To change the cache size, you 
have
to use SET CACHE_SIZE 16384

Original comment by thomas.t...@gmail.com on 8 Mar 2010 at 7:37

GoogleCodeExporter commented 9 years ago
Ok. Thanks for the clarificaton. I'll try this.

Original comment by youn...@gmail.com on 8 Mar 2010 at 8:15

GoogleCodeExporter commented 9 years ago
Running analyze after loading all the rows fixed the hang. Thank you!
Now if it only did an auto-update of statistics..... :)

Original comment by youn...@gmail.com on 10 Mar 2010 at 7:43