Closed ameyer3000 closed 9 months ago
You need an additional index on TEST(NAME)
CREATE INDEX TEST_NAME_IDX ON TEST(NAME);
and, unfortunately, you also may need USE INDEX
clause, because query planner incorrectly chooses a less efficient index:
EXPLAIN ANALYZE
SELECT * FROM test USE INDEX(TEST_NAME_IDX) WHERE name = 'NAME9999' AND (type = 'TYPE1' OR type = 'TYPE2');
> SELECT
> "PUBLIC"."TEST"."NAME",
> "PUBLIC"."TEST"."TYPE"
> FROM "PUBLIC"."TEST" USE INDEX ("TEST_NAME_IDX")
> /* PUBLIC.TEST_NAME_IDX: NAME = 'NAME9999' */
> /* scanCount: 11 */
> WHERE ("TYPE" IN('TYPE1', 'TYPE2'))
> AND ("NAME" = 'NAME9999')
That's certainly interesting! However for my use-case, it would be complicated/impossible to add these hints, as I need to support two different database engines at the same time - and that for many different types of queries with many other fields that may or may not use IN (list of literals)
. In my application, queries are not directly written in SQL, coming from application A
that doesn't have knowledge about schema internals like indices etc, and another application B
that has (among others) an H2-"backend" that generates SQL based on some parameters.
Any chance for a more generic fix, at least for field IN (list of literals)
? Any pointers to the source-code, where this can be tweaked? Not sure if it's easy, in particular for someone without much DB background (like myself). Adding an index for each field is OK (that much control I do have over the schema), but then adding USE INDEX
seems to be a workaround only for the case where at most one field uses IN (...)
- as soon as there are two or more such expressions, this workaround doesn't seem to apply (does it? I didn't succeed adding multiple USE INDEX
clauses).
A query can use only one index.
WHERE column IN (list)
doesn't need any quirks in H2, it only needs an index on (column)
or (column, other_columns, …)
with column
first.
But in your case WHERE
criteria specifies multiple columns. H2 cannot optimize all such queries well enough.
In your query there are two problems:
column1 = something AND column2 IN (list)
don't construct index condition with a list of value pairs for two-column index on (column1, column2)
. This feature can be introduced in H2, but it requires a good knowledge of H2's internals.column1
for column1 = something
OR index on column2
for column2 IN(list)
(indexes may have other columns, but these columns need to be specified first) and H2 needs to choose more efficient strategy. Currently H2 prefers less efficient index on (column2, column1)
for column2 IN(list)
, but with your distribution of data this strategy is very inefficient. Without index on (column2, column1)
, but with individual indexes on (column1)
and (column2)
H2 correctly chooses an index on (column1)
. It means there is a problem with index cost calcutaion. The related code is in Index.getCostRangeIndex()
.I think I found an acceptable workaround for me. After populating all my tables (with unique-constraint/PK), there will only be SELECT
s. After dropping the PK as well as creating single-field indices for all PK fields for all tables, SELECT
s don't require excessive scans even without hints like USE INDEX
. This approach is "generic enough" for my purpose.
I had a quick look at Index.getCostRangeIndex
but I guess I would have to invest a lot more time to be of real help there, unfortunately. Not sure if this issue should be closed (I have a workaround), or kept open (a more generic solution would still be nice where PK can be kept, no USE INDEX
).
Anyhow, thanks a lot for your ideas!
If I may jump aboard this discussion thread, looks like hibernate 6 changed the way it generates queries for embedded IDs from this form
select p1_0.x,p1_0.y from mytable p1_0 where p1_0.x = ‘mypieceofId1’ and p1_0.y= 1677042000000
to this one:
select p1_0.x,p1_0.y from mytable p1_0 where (p1_0.x,p1_0.y) in((‘mypieceofId1’, 1677042000000))
Unless I'm very much mistaken (if I am, pls tell me...) the second one does not seem to use any index (I tried both fields, in reveres, and individual ones). Pls take this as a personal preference, the second one is not how I would write a SQL query, but given how widespread hibernate is and given that embedded IDs are not uncommon, I am afraid this question will pop up more often. Related discussion on hibernate forum: https://discourse.hibernate.org/t/upgrade-to-hibernate-6-insert-performance-with-embedded-ids/7515
Any opinion/advice wcome :-)
select p1_0.x,p1_0.y from mytable p1_0 where (p1_0.x,p1_0.y) in((‘mypieceofId1’, 1677042000000))
This is an entirely different already fixed problem, you can compile H2 from its current sources to test the fix.
select p1_0.x,p1_0.y from mytable p1_0 where (p1_0.x,p1_0.y) in((‘mypieceofId1’, 1677042000000))
This is an entirely different already fixed problem, you can compile H2 from its current sources to test the fix.
that is great (and really appreciated)! let me try :-) !
Built and tested, it picks up the index indeed. I will have to bring in a snaphot jar into my build but definitely small work compared to try get around the new hibernate behavior @katzyn tx for the help, really appreciated!
Closing this as resolved, answered and stale. Please re-open when there are any unexplained answers left.
I am struggling with slow query execution for a WHERE-clause such as
field IN ('value1', 'value2' )
, whereas equivalent queries (using UNION or CTE, see below) are performed as expected. Self-contained SQL below, tested with the standalone console (java -jar h2-2.1.214.jar
).As the
IN ( ... )
-condition isn't really complex, I would assume that the primary key can be used. Am I missing something, or is this optimization not yet implemented (as suggested by user "Impaler" on stackoverflow ).NOTE: when changing the primary key definition to PRIMARY KEY (name, type), the problem kind of disappears for this particular example - however this still wouldn't entirely fix the general problem as I don't have full control over the schema definition and even if I had, similar queries where other fields from the PK are appearing in field IN ( ... ) would still be slow.