this design doc is for StoneDB version 2.0. (W.I.P)

1: Overview

Different to version 1.0, in version 2.0 we employ a MPP architect to process the queries that will make the query engine become a totally new thing. Optimizer model and execution model need to refactor version 1.0. In version 2.0, the secondary engine is enabled as our column-based engine to run AP workloads. In this document, we give the explanation for query engine of StoneDB version 2.0.

In version 2.0, the secondary engine is introduced, and this will change the original codes we are using in version 1.0. the logic of query processing is also changed. The control flow and data flow are different with version 1.0. In version 1.0, query engine just processes two relative independent query processing logic due to the tow storage engines are independent each other. But, in version 2.0, the two storage engines co-operate each other, InnoDB acts as a primary storage engine, Tianmu acts as the secondary engine. The Query engine will deal with these two storage engines simultaneously.

In version 2.0, a new cost model will be built in order to cope with TP workloads and AP workloads. The different type workloads will be routed to primary storage engine or secondary storage engine on their costs.

First of all, query engine should recognize the type of these workloads, and the boundaries, etc., also should be defined in this draft.

2: Query Processing

As we have known that, the phase of query processing divided into these steps: (1) read the query string, then generate the abstract syntax tree (AST), and do some rewriting, such as transferring the asterisk to corresponding columns or transferring the views.

In MySQL 8.0, the code of query engine has been refactored. Some new classes are introduced and some one refactored, that make the query engine of MySQL 8.0 to be a new stuff.

The call flow of MySQL of handling a new connection as following. In function do_comamnd and dispatch_command, MySQL will do the selection or DML operations according to the command type. The call stacks of do_command are as following. It describes how MySQL handle a new connection and spawn a new thread to deal with this new connection.

 #ifdef _WIN32
    int win_main(int argc, char **argv)
 #else
    int mysqld_main(int argc, char **argv)
 #endif
    Mysqld_socket_listener::check_and_spawn_admin_connection_handler_thread 
        spawn_admin_thread 
           extern "C" void *admin_socket_thread(void *arg) 
               static bool handle_admin_socket 
                  Connection_handler_manager::process_new_connection 
                    Per_thread_connection_handler::add_connection
                       handle_connection
                         prepare_new_connection_state
                            do_command

2.1 Logical Optimization

2.1.1: Lexical Parsing and grammatical Processing

After we receive the new connection, query engine of MySQL will process these connections according the quest type (or query command type). Before a AST created, query engine will invoke lex and bison to process the query string. In order to speed up the lexical processing, MySQL do not use system lex to tokenize the query string but using their version of lex. For more information, please refer to sql\sql_lex.*.

When the query string is tokenized, it will be in grammatical processing stage. In this stage, MySQL generates the AST according to grammar rules defined in sql_yacc.y. The new SQLs we introduced in version 2.0 have been described in issue #423. You can refer to this issue for details.

2.1.2: Logical Processing

Query

In logical processing stage, for queries, MySQL preform the logical optimization, such as join processing, const expression fold, sub-selection processing and sub-join processing, etc.

Now, in function dispatch_command of sql\sql_parse.cc, It’s the entry point of starting query processing. Firstly, It will do preparation and optimization for primary engine, but if we found the secondary engine is enable, we will also do preparation and optimization for secondary engine. In dispatch_command

    case COM_QUERY : {

        const LEX_CSTRING orig_query = thd->query();
        Parser_state parser_state;
        if (parser_state.init(thd, thd->query().str, thd->query().length))
             break;
        // Initially, prepare and optimize the statement for the primary
        // storage engine. If an eligible secondary storage engine is
        // found, the statement may be reprepared for the secondary
        // storage engine later.

        const auto saved_secondary_engine = thd->secondary_engine_optimization();

       thd->set_secondary_engine_optimization(Secondary_engine_optimization::PRIMARY_TENTATIVELY);
       copy_bind_parameter_values(thd, com_data->com_query.parameters,
                                                       com_data->com_query.parameter_count);

       dispatch_sql_command(thd, &parser_state);

      // Check if the statement failed and needs to be restarted in another storage engine.
      check_secondary_engine_statement(thd, &parser_state, orig_query.str,  orig_query.length); 
      thd->set_secondary_engine_optimization(saved_secondary_engine);

      .. 
      break;
    }

From the codes above, we notice that secondary_engine_optimization() function is used to check whether the secondary engine can be used to optimize the queries or not. Then, dispatch_sql_command do query optimization. In this function includes the following stages:

1: rewrite the queries, mysql_rewrite_query.
2: execute the queries, mysql_execute_command.
3: execute the statement, lex->m_sql_cmd->execute(thd);.

As sql\sql_yacc.yy said, the YACC will create a PT_select_stmt to present a select statement. Therefore, lex->m_sql_cmd->execute(thd) will invoke Sql_cmd_dml::execute to do selection. (BTW: Sql_cmd_select inherits from Sql_cmd_dml)

In Sql_cmd_dml::prepare, in fact, Sql_cmd_select::prepare_inner do the real preparation for optimization.

  Query_expression *const unit = lex->unit;
  Query_block *parameters = unit->global_parameters();
  if (!parameters->has_limit()) {
    parameters->m_use_select_limit = true;
  }
  if (unit->is_simple()) {
    Query_block *const select = unit->first_query_block();
    select->context.resolve_in_select_list = true;
    select->set_query_result(result);
    unit->set_query_result(result);
    // Unlock the table as soon as possible, so don't set SELECT_NO_UNLOCK.
    select->make_active_options(0, 0);

    if (select->prepare(thd, nullptr)) return true;

    unit->set_prepared();
  } else {
    // If we have multiple query blocks, don't unlock and re-lock
    // tables between each each of them.
    if (unit->prepare(thd, result, nullptr, SELECT_NO_UNLOCK, 0)) return true;
  }

As we known, the select statement is represented by Query_block, or Query_expressionand therefore, sql_cmd_select::prepare_inner will be into Query_block::prepare or Query_expression::prepare (sql\sql_resolver.cc)

Prepare query block for optimization.

Resolve table and column information. Resolve all expressions (item trees), ie WHERE clause, join conditions, GROUP BY clause, HAVING clause, ORDER BY clause, LIMIT clause. Prepare all subqueries recursively as part of resolving the expressions. Apply permanent transformations to the abstract syntax tree, such as semi-join transformation, derived table transformation, elimination of constant values and redundant clauses (e.g ORDER BY, GROUP BY).

When this function is called, it is assumed that the precheck() function has been called. precheck() ensures that the user has some SELECT privileges to the tables involved in the query. When resolving views it has also been established that the user has some privileges for them. To prepare a view for privilege checking, it is also needed to call check_view_privileges() after views have been merged into the query. This is not necessary for unnamed derived tables since it has already been established that we have SELECT privileges for the underlying tables by the precheck functions. (precheck() checks a query without resolved views, ie. before tables are opened, so underlying tables of views are not yet available).

When a query block is resolved, always ensure that the user has SELECT privileges to the columns referenced in the WHERE clause, the join conditions, the GROUP BY clause, the HAVING clause and the ORDER BY clause.

When resolving the outer-most query block, ensure that the user also has SELECT privileges to the columns in the selected expressions.

When setting up a derived table or view for materialization, ensure that the user has SELECT privileges to the columns in the selected expressions

Column privileges are normally checked by Item_field::fix_fields(). Exceptions are select list of derived tables/views which are checked in TABLE_LIST::setup_materialized_derived(), and natural/using join conditions that are checked in mark_common_columns().

As far as INSERT, UPDATE and DELETE statements have the same expressions as a SELECT statement, this note applies to those statements as well.

Here, the query engine has done the preparation works, then it will be into the next step: optimization. We can draw a conclusion on the call stack of optimization.

 do_command
    dispatch_command
       dispatch_sql_command
           mysql_execute_command
               Sql_cmd_dml::execute
                   Query_expression::prepare (or Query_block::prepare)
                   Query_expression::optimize(or Query_block::optimize) 
                        JOIN::optimize

and do_command has been described in the previous section. Now, we have an overview of call stack of query processing. In Sql_cmd_dml::execute function, the MySQL validates whether query engine can use secondary engine or not by invoking validate_use_secondary_engine, and the columns were defined as SECONDARY=xxx or NOT_SECONDARY by function reads_not_secondary_columns.

Different with processing of queries, DMLs does not need any logical optimization. (except for complicated DML statements, such as insert xxx select xxxx). Traditionally, there are three types of DML, such as INSERT, DELETE, UPDATE. In MySQL 8.0, there are described in sql\sql_insert.cc, sql\sql_delete.cc and sql\sql_update,cc, respectively.

Different with selection operations, it does not modify the records, so that they do not need do propagation of the changes. But, the DML operations change the records and these changes need to propagate to the secondary engine. And, the changes were written into InnoDB, so that the propagation works are launched at InnoDB layer.

Taking insert operation for an instance to give the query engine of StoneDB version 2.0. bool Sql_cmd_insert_values::execute_inner(THD *thd) implements the main processing logic.

After MySQL does checks, it calls write_record(thd, insert_table, &info, &update) to write the rows to InnoDB.

Now, backing to logical optimization, MySQL calls Query_expression::optimize to do optimization. And, it will call JOIN::optimize. In logical optimization stage, it does not involve secondary engine too much.

2.2 Physical Optimization

After preparation stage (what the query engine does has been described in previous section), the query engine will step into Query_expression::optimize to do optimization. If fact, Query_expression::optimize means the query engine will do logical and physical optimization. First of all, the call stack of optimization is listed as following:

Sql_cmd_dml::execute
    execute_inner
       Query_expression::optimize
           or
       optimize_secondary_engine 
           Query_block::optimize
              JOIN::optimize
                ...
           Query_expression::execute
              Query_expression::ExecuteIteratorQuery

The explanation of JOIN::optimize is listed as following:

Optimizes one query block into a query execution plan (QEP.) This is the entry point to the query optimization phase. This phase applies both logical (equivalent) query rewrites, cost-based join optimization, and rule-based access path selection. Once an optimal plan is found, the member function creates/initializes all structures needed for query execution. The main optimization phases are outlined below:

-# Logical transformations:

Outer to inner joins transformation.

Equality/constant propagation.

Partition pruning.

COUNT(*), MIN(), MAX() constant substitution in case of implicit grouping.

ORDER BY optimization. -# Perform cost-based optimization of table order and access path selection. See JOIN::make_join_plan() -# Post-join order optimization:

Create optimal table conditions from the where clause and the join conditions.

Inject outer-join guarding conditions.

Adjust data access methods after determining table condition (several times.)

Optimize ORDER BY/DISTINCT. -# Code generation

Set data access functions.

Try to optimize away sorting/distinct.

Setup temporary table usage for grouping and/or sorting.

In physical optimization, the cost estimation is the most important thing. the cost model is the key of the physical optimization. the query engine will route the workloads to primary engine or secondary engine according to their costs. In sql\handler.h, MySQL give us some handlers of secondary engine execution.

Do preparation for secondary engine execution.

using prepare_secondary_engine_t = bool (*)(THD *thd, LEX *lex);

Do optimization for secondary engine.

using optimize_secondary_engine_t = bool (*)(THD *thd, LEX *lex);

The cost comparison between the two join plans.

using compare_secondary_engine_cost_t = bool (*)(THD *thd, const JOIN &join,
                                             double optimizer_cost,
                                             bool *use_best_so_far,
                                             bool *cheaper,
                                             double *secondary_engine_cost);

Evaluation of the cost of access paths.
```
using secondary_engine_modify_access_path_cost_t = bool (*)(
THD *thd, const JoinHypergraph &hypergraph, AccessPath *access_path);
```
And, There're some secondary engine variable memebers in struct handlerton, and these variables will be initialized in plugin initialization phase.

// sql\handler.h
struct handlerton { 
   ...
  /**
    Pointer to a function that prepares a secondary engine for executing a
    statement.

    @see prepare_secondary_engine_t for function signature.
  */
  prepare_secondary_engine_t prepare_secondary_engine;

  /**
    Pointer to a function that optimizes the current statement for
    execution on the secondary storage engine represented by this
    handlerton.

    @see optimize_secondary_engine_t for function signature.
  */
  optimize_secondary_engine_t optimize_secondary_engine;

  /**
    Pointer to a function that estimates the cost of executing a join in a
    secondary storage engine.

    @see compare_secondary_engine_cost_t for function signature.
  */
  compare_secondary_engine_cost_t compare_secondary_engine_cost;

  /// Bitmap which contains the supported join types and other flags
  /// for a secondary storage engine when used with the hypergraph join
  /// optimizer. If it is empty, it means that the secondary engine
  /// does not support the hypergraph join optimizer.
  SecondaryEngineFlags secondary_engine_flags;

  /// Pointer to a function that evaluates the cost of executing an access path
  /// in a secondary storage engine.
  ///
  /// @see secondary_engine_modify_access_path_cost_t for function signature.
  secondary_engine_modify_access_path_cost_t
      secondary_engine_modify_access_path_cost;     
  ...
};

Taking the following code for an instance.

static int Init(MYSQL_PLUGIN p) {
  loaded_tables = new LoadedTables();

  handlerton *hton = static_cast<handlerton *>(p);
  hton->create = Create;
  hton->state = SHOW_OPTION_YES;
  hton->flags = HTON_IS_SECONDARY_ENGINE;
  hton->db_type = DB_TYPE_UNKNOWN;
  hton->prepare_secondary_engine = PrepareSecondaryEngine;
  hton->optimize_secondary_engine = OptimizeSecondaryEngine;
  hton->compare_secondary_engine_cost = CompareJoinCost;
  hton->secondary_engine_flags =
      MakeSecondaryEngineFlags(SecondaryEngineFlag::SUPPORTS_HASH_JOIN);
  hton->secondary_engine_modify_access_path_cost = ModifyAccessPathCost;
  return 0;
}

static int Deinit(MYSQL_PLUGIN) {
  delete loaded_tables;
  loaded_tables = nullptr;
  return 0;
}

Here, based on the framework of MySQL 8.0, we have given the notes on secondary engine. In physical optimization, the join order, join type choosing, etc. And, the cost-based optimization of table order and access path are implemented in JOIN::make_join_plan().

As we describe in #436, the query engine will try to use secondary engine if thd->m_current_query_cost is greate than thd->variables.secondary_engine_cost_threshold and the secondary engine is enable, this comparison logic can be found in sql_select.cc static bool retry_with_secondary_engine(THD *thd) . But, the threshhold of using secondary engine is set by system variable, secondary_engine_cost_threshold. In StoneDB version 2.0, we will route the workloads automatically. Therefore, we will calcauate the costs on primary engine and secondary engine, respectively, and according to the costs to determine which engine should be chosen.

In MySQL, the call stacks of using secondary engine are as following:

Sql_cmd_dml::execute_inner
    accumulate_statement_cost(lex); 
    optimize_secondary_engine
        handlerton::optimize_secondary_engine

Before the query engine steps into handlerton::optimize_secondary_engine, we will do cost calculation of the statement on secondary engine, and accumulate_statement_cost_on_secondary_engine will be added in StoneDB version 2.0. That will make the variable of secondary_engine_cost_threshold is not set mannually, but sets its value automatically according to the result of accumulate_statement_cost_on_secondary_engine.

In static bool OptimizeSecondaryEngine(THD *thd MY_ATTRIBUTE((unused)), LEX *lex), StoneDB will send the query statment to secondary engine, Tianmu, to do optimization.

stoneatom / stonedb

[RFC]: Query Engine for StoneDB V2.0 #493

1: Overview

2: Query Processing

2.1 Logical Optimization

2.1.1: Lexical Parsing and grammatical Processing

2.1.2: Logical Processing

2.2 Physical Optimization

2.2.1 Access path selection

2.2.2 Best Access path Generation

2.3 Parallel Query

3. Massive Parallel Processing Optimization

4. Miscellaneous