Open RingsC opened 1 year ago
Optimizer model and execution model need to refactor version 1.0
Maybe word in is needed in this sentence.
The call flow of do_command as following
Add 'are'
Then in dispatch_sql_command do query optimization
Remove word in and add will maybe
2: execute the queries, mysql_execute_command. 3: execute the queries, mysql_execute_command
Repeated
The call flow of do_command as following
Add 'are'
Then in dispatch_sql_command do query optimization
Remove word in and add will maybe
2: execute the queries, mysql_execute_command. 3: execute the queries, mysql_execute_command
Repeated
Thanks, bro.
this design doc is for StoneDB version 2.0. (W.I.P)
1: Overview
Different to version 1.0, in version 2.0 we employ a MPP architect to process the queries that will make the query engine become a totally new thing. Optimizer model and execution model need to refactor version 1.0. In version 2.0, the secondary engine is enabled as our column-based engine to run AP workloads. In this document, we give the explanation for query engine of StoneDB version 2.0.
In version 2.0, the secondary engine is introduced, and this will change the original codes we are using in version 1.0. the logic of query processing is also changed. The control flow and data flow are different with version 1.0. In version 1.0, query engine just processes two relative independent query processing logic due to the tow storage engines are independent each other. But, in version 2.0, the two storage engines co-operate each other, InnoDB acts as a primary storage engine, Tianmu acts as the secondary engine. The Query engine will deal with these two storage engines simultaneously.
In version 2.0, a new cost model will be built in order to cope with TP workloads and AP workloads. The different type workloads will be routed to primary storage engine or secondary storage engine on their costs.
First of all, query engine should recognize the type of these workloads, and the boundaries, etc., also should be defined in this draft.
2: Query Processing
As we have known that, the phase of query processing divided into these steps: (1) read the query string, then generate the abstract syntax tree (AST), and do some rewriting, such as transferring the asterisk to corresponding columns or transferring the views.
In MySQL 8.0, the code of query engine has been refactored. Some new classes are introduced and some one refactored, that make the query engine of MySQL 8.0 to be a new stuff.
The call flow of MySQL of handling a new connection as following. In function
do_comamnd
anddispatch_command
, MySQL will do the selection or DML operations according to the command type. The call stacks ofdo_command
are as following. It describes how MySQL handle a new connection and spawn a new thread to deal with this new connection.2.1 Logical Optimization
2.1.1: Lexical Parsing and grammatical Processing
After we receive the new connection, query engine of MySQL will process these connections according the quest type (or query command type). Before a AST created, query engine will invoke
lex
andbison
to process the query string. In order to speed up the lexical processing, MySQL do not use systemlex
to tokenize the query string but using their version oflex
. For more information, please refer tosql\sql_lex.*
.When the query string is tokenized, it will be in grammatical processing stage. In this stage, MySQL generates the AST according to grammar rules defined in
sql_yacc.y
. The new SQLs we introduced in version 2.0 have been described in issue #423. You can refer to this issue for details.2.1.2: Logical Processing
In logical processing stage, for queries, MySQL preform the logical optimization, such as join processing, const expression fold, sub-selection processing and sub-join processing, etc.
Now, in function
dispatch_command
ofsql\sql_parse.cc
, It’s the entry point of starting query processing. Firstly, It will do preparation and optimization for primary engine, but if we found the secondary engine is enable, we will also do preparation and optimization for secondary engine. Indispatch_command
From the codes above, we notice that
secondary_engine_optimization()
function is used to check whether the secondary engine can be used to optimize the queries or not. Then,dispatch_sql_command
do query optimization. In this function includes the following stages:mysql_rewrite_query
.mysql_execute_command
.lex->m_sql_cmd->execute(thd);
.As
sql\sql_yacc.yy
said, the YACC will create aPT_select_stmt
to present a select statement. Therefore,lex->m_sql_cmd->execute(thd)
will invokeSql_cmd_dml::execute
to do selection. (BTW:Sql_cmd_select
inherits fromSql_cmd_dml
)In
Sql_cmd_dml::prepare
, in fact,Sql_cmd_select::prepare_inner
do the real preparation for optimization.As we known, the select statement is represented by
Query_block
, orQuery_expression
and therefore, sql_cmd_select::prepare_inner will be intoQuery_block::prepare
orQuery_expression::prepare
(sql\sql_resolver.cc)
Here, the query engine has done the preparation works, then it will be into the next step: optimization. We can draw a conclusion on the call stack of optimization.
and
do_command
has been described in the previous section. Now, we have an overview of call stack of query processing. InSql_cmd_dml::execute
function, the MySQL validates whether query engine can use secondary engine or not by invokingvalidate_use_secondary_engine
, and the columns were defined asSECONDARY=xxx
orNOT_SECONDARY
by functionreads_not_secondary_columns
.Different with processing of queries, DMLs does not need any logical optimization. (except for complicated DML statements, such as
insert xxx select xxxx
). Traditionally, there are three types of DML, such as INSERT, DELETE, UPDATE. In MySQL 8.0, there are described insql\sql_insert.cc
,sql\sql_delete.cc
andsql\sql_update,cc
, respectively.Different with selection operations, it does not modify the records, so that they do not need do propagation of the changes. But, the DML operations change the records and these changes need to propagate to the secondary engine. And, the changes were written into InnoDB, so that the propagation works are launched at InnoDB layer.
Taking
insert
operation for an instance to give the query engine of StoneDB version 2.0.bool Sql_cmd_insert_values::execute_inner(THD *thd)
implements the main processing logic.After MySQL does checks, it calls
write_record(thd, insert_table, &info, &update)
to write the rows to InnoDB.Now, backing to logical optimization, MySQL calls
Query_expression::optimize
to do optimization. And, it will callJOIN::optimize
. In logical optimization stage, it does not involve secondary engine too much.2.2 Physical Optimization
After preparation stage (what the query engine does has been described in previous section), the query engine will step into
Query_expression::optimize
to do optimization. If fact,Query_expression::optimize
means the query engine will do logical and physical optimization. First of all, the call stack of optimization is listed as following:The explanation of JOIN::optimize is listed as following:
In physical optimization, the cost estimation is the most important thing. the cost model is the key of the physical optimization. the query engine will route the workloads to primary engine or secondary engine according to their costs. In
sql\handler.h
, MySQL give us some handlers of secondary engine execution.Do preparation for secondary engine execution.
Do optimization for secondary engine.
The cost comparison between the two join plans.
Evaluation of the cost of access paths.
And, There're some secondary engine variable memebers in
struct handlerton
, and these variables will be initialized in plugin initialization phase.Taking the following code for an instance.
Here, based on the framework of MySQL 8.0, we have given the notes on secondary engine. In physical optimization, the join order, join type choosing, etc. And, the cost-based optimization of table order and access path are implemented in
JOIN::make_join_plan()
.As we describe in #436, the query engine will try to use secondary engine if
thd->m_current_query_cost
is greate thanthd->variables.secondary_engine_cost_threshold
and the secondary engine is enable, this comparison logic can be found insql_select.cc
static bool retry_with_secondary_engine(THD *thd)
. But, the threshhold of using secondary engine is set by system variable,secondary_engine_cost_threshold
. In StoneDB version 2.0, we will route the workloads automatically. Therefore, we will calcauate the costs on primary engine and secondary engine, respectively, and according to the costs to determine which engine should be chosen.In MySQL, the call stacks of using secondary engine are as following:
Before the query engine steps into handlerton::optimize_secondary_engine, we will do cost calculation of the statement on secondary engine, and
accumulate_statement_cost_on_secondary_engine
will be added in StoneDB version 2.0. That will make the variable ofsecondary_engine_cost_threshold
is not set mannually, but sets its value automatically according to the result ofaccumulate_statement_cost_on_secondary_engine
.In
static bool OptimizeSecondaryEngine(THD *thd MY_ATTRIBUTE((unused)), LEX *lex)
, StoneDB will send the query statment to secondary engine,Tianmu
, to do optimization.2.2.1 Access path selection
2.2.2 Best Access path Generation
2.3 Parallel Query
3. Massive Parallel Processing Optimization
4. Miscellaneous