greenlion / warp

WarpSQL Server, an open source OLAP focused distribution of the world's most popular open source database bundled with OLAP performance related plugins such as the WARP storage engine..
http://warpsql.blog
Other
41 stars 2 forks source link

row estimation fudge triggers FirstMatch for IN subqueries when materialization should be used #69

Closed greenlion closed 3 years ago

greenlion commented 3 years ago
-- WARP
mysql>  select count(*) from employees where id in (select id from employees);
+----------+
| count(*) |
+----------+
|  1799030 |
+----------+
1 row in set (1 min 24.45 sec)

-- InnoDB
mysql>  select count(*) from employees_i where id in (select id from employees_i);
+----------+
| count(*) |
+----------+
|  1799030 |
+----------+
1 row in set (3.89 sec)

mysql> explain select count(*) from employees_i where id in (select id from employees_i);
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+---------+----------+-------------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref                 | rows    | filtered | Extra       |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+---------+----------+-------------+
|  1 | SIMPLE       | employees_i | NULL       | ALL    | NULL                | NULL                | NULL    | NULL                | 1791715 |   100.00 | Using where |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 4       | warp.employees_i.id |       1 |   100.00 | NULL        |
|  2 | MATERIALIZED | employees_i | NULL       | ALL    | NULL                | NULL                | NULL    | NULL                | 1791715 |   100.00 | NULL        |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+---------------------+---------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

mysql> explain select count(*) from employees where id in (select id from employees);
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                             |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | NULL                                                              |
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where; FirstMatch(employees); Using join buffer (hash join) |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> set @@optimizer_switch='firstmatch=off';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select count(*) from employees where id in (select id from employees);
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------------+------+----------+-------+
| id | select_type  | table       | partitions | type   | possible_keys       | key                 | key_len | ref               | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------------+------+----------+-------+
|  1 | SIMPLE       | employees   | NULL       | ALL    | NULL                | NULL                | NULL    | NULL              |    2 |   100.00 | NULL  |
|  1 | SIMPLE       | <subquery2> | NULL       | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 4       | warp.employees.id |    1 |   100.00 | NULL  |
|  2 | MATERIALIZED | employees   | NULL       | ALL    | NULL                | NULL                | NULL    | NULL              |    2 |   100.00 | NULL  |
+----+--------------+-------------+------------+--------+---------------------+---------------------+---------+-------------------+------+----------+-------+
3 rows in set, 1 warning (0.01 sec)

mysql> select count(*) from employees where id in (select id from employees);
+----------+
| count(*) |
+----------+
|  1799030 |
+----------+
1 row in set (5.69 sec)
greenlion commented 3 years ago

This also messed up the order or hashing tables. The parallel query system correctly reorders tables without this munge. It has been removed in parallel_query branch. This fixes quite a few related problems, such as row estimates in EXPLAIN output being wrong (was always 2 for the largest table in a query).