stoneatom / stonedb

StoneDB is an Open-Source MySQL HTAP and MySQL-Native DataBase for OLTP, Real-Time Analytics, a counterpart of MySQLHeatWave. (https://stonedb.io)
https://stonedb.io/
GNU General Public License v2.0
858 stars 139 forks source link

[RFC]: SQL supported on StoneDB v2.0 #423

Open hustjieke opened 1 year ago

hustjieke commented 1 year ago

SQL Supported on StoneDB V2.0

This is a supplement to https://github.com/stoneatom/stonedb/issues/436

Overview

In this issue, we give out the design of SQL statements supported by StoneDB V2.0. As mentioned in #436, StoneDB V2.0 will support more SQL statements compared to V1.0, including SQL statements used to create Tianmu tables, load data from InnoDB tables, and monitor StoneDB status. Developers can follow the guidance below to try these SQL statements.

SQL Syntaxes

This part describes the SQL syntaxes introduced in StoneDB V2.0.

Create a Tianmu Table

First, let’s see how to create a Tianmu table. The SQL syntaxes used in StoneDB V2.0 is different from that used in V1.0. In StoneDB V1.0, you simply need to specify the engine type to tianmu. Following provides the syntax:

--- SQL syntax in StoneDB V1.0
create table orders (
   col1 int, 
   col2 varchar(10)
) engine = tianmu; 

However, StoneDB V2.0 adopts a new mechanism called secondary engine. This makes the SQL syntaxes in creating tables and loading data in V2.0 different from those in V1.0.

Following provides the SQL syntaxes that you can use to create a Tianmu table in StoneDB V2.0:

CREATE TABLE orders (nameVARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED' );

Before using column comments to specify columns to store in Tianmu, enable Tianmu as the secondary engine so that the columns then defined as TIANMU_COLUMN will be loaded into the Tianmu engine.

In V2.0, we specify Tianmu as the secondary engine, instead of the primary engine. As shown in the example above, we set SECONDARY_ENGINE to Tianmu. The SECONDARY_ENGINE field is also used in MySQL 8.0.

ALTERTABLE orders SECONDARY_ENGINE= Tianmu;

If you do not want to convert the entire table, you can also use a column comment to convert needed columns in the table to Tianmu columns. Following provides the syntax:

ALTERTABLE orders MODIFY name VARCHAR(100) COMMENT 'TIANM_COLUMN=ENCODING=SORTED';

Variable-length encoding (VARLEN)

Dictionary encoding (SORTED)

When tables are loaded into Tianmu, variable-length encoding is applied to CHAR, VARCHAR, and TEXT-type columns by default. To use dictionary encoding, you must define the TIANMU_COLUMN=ENCODING=SORTED keyword string in a column comment before loading the table. The keyword string must be uppercase. Otherwise, it will be ignored.

You can define the keyword string in a CREATE TABLE or ALTER TABLE statement. Following are examples:

CREATE TABLE orders (name VARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED');
ALTER TABLE orders MODIFY name VARCHAR(100) COMMENT 'TIANMU_COLUMN=ENCODING=SORTED';

If you want to run JOIN operations involving string columns or use string functions and operators, we recommend that you use variable-length encoding. Variable-length encoding supports more expressions, filters, functions, and operators than dictionary encoding. Otherwise, select the encoding type based on the number of distinct values in the string column relative to the cardinality of the table.

   - BIGINT

   - BOOL

Load the Data from an InnoDB Table to a Tianmu Table

Exclude Table Columns

Before loading a table into Tianmu, exclude columns with unsupported data types. Otherwise, the operation will fail. Therefore, we need to add column attribute NOT SECONDARY in an ALTER TABLE or CREATE TABLE statement. Add a new options check named description_secondary_definition.

description_secondary_definition: DESCRIPTION data_type NOT SECONDARY

- Adding it in [ALTER TABLE Statement](https://dev.mysql.com/doc/refman/8.0/en/alter-table.html)

alter_option: { ... ... | MODIFY [COLUMN] col_name column_definition [FIRST | AFTER col_name] | MODIFY description_secondary_definition ... ... }

description_secondary_definition: DESCRIPTION data_type NOT SECONDARY


For example:

mysql> ALTER TABLE orders MODIFY description BLOB NOT SECONDARY; mysql> CREATE TABLE orders (id INT, description BLOB NOT SECONDARY);

> NOTE:
>If a query accesses a column defined with the `NOT SECONDARY` attribute, the query is executed on InnoDB by default.

### <h3 id="2.2.2">Define the Secondary Engine</h3>
Add keyword `SECONDARY_ENGINE` to `table_option`. `SECONDARY_ENGINE_SYM` has already implemented in [sql/lex.h](https://github.com/mysql/mysql-server/blob/8.0/sql/lex.h#L627) and [sql/sql_yacc.yy](https://github.com/mysql/mysql-server/blob/8.0/sql/sql_yacc.yy#L6669)
The table option `table_option` can be used in both ALTER TABLE and CREATE TABLE statements:

table_option: { AUTOEXTEND_SIZE [=] value | AUTO_INCREMENT [=] value ... ... | SECONDARY_ENGINE = engine_name | SECONDARY_ENGINE_ATTRIBUTE [=] 'string' ... ... }

For example:

mysql> ALTER TABLE orders SECONDARY_ENGINE = Tianmu;

mysql> CREATE TABLE orders (id INT) SECONDARY_ENGINE = Tianmu;

### <h3 id="2.2.3">Loade Tables</h3>

To load a table into Tianmu, specify the `SECONDARY_LOAD` option in an ALTER TABLE statement.
The keyword `SECONDARY_LOAD` has been added in **sql/sql_yacc.yy**:
    | SECONDARY_LOAD_SYM
      {
        $$= NEW_PTN PT_alter_table_secondary_load;
      }
    | SECONDARY_UNLOAD_SYM
Classes and functions defined in **sql/sql_alter.h**:

/* Represents ALTER TABLE SECONDARY_LOAD/SECONDARY_UNLOAD statements. / class Sql_cmd_secondary_load_unload final : public Sql_cmd_common_alter_table { public: // Inherit the constructors from the parent class. using Sql_cmd_common_alter_table::Sql_cmd_common_alter_table;

bool execute(THD *thd) override;

private: bool mysql_secondary_load_or_unload(THD thd, TABLE_LIST table_list); };

For example:

mysql> ALTER TABLE orders SECONDARY_LOAD;



## <h2 id="2.3">Run Queries</h2>
From the users’ perspective, the method to run queries in StoneDB, no matter V1.0 or V2.0, is the same as that in MySQL. As long as you know how to use MySQL, you master how to use StoneDB. However, the query processing procedure is slightly different between MySQL and StoneDB V2.0. 
MySQL performs the following steps when processing a query:
1. Parse the query to generate an abstract syntax tree (AST).
2. Pass the AST to the optimizer for logical and physical optimization. 
3. Create an execution plan based on the query plan.
4. Handle the query based on the execution plan.

The difference lies between step 3 and step 4. After completing optimization, StoneDB V2.0 will generate an execution plan for each storage engine, compare the costs of the execution plans, and then route the query to the storage engine that offers the lower cost. For more information, see issue #xxx.

**Additional Information** 
- SELECT statements will be routed to Tianmu. However, if the optimizer identifies that InnoDB is faster in processing certain SELECT statements, the SELECT statements will not be routed to Tianmu.

- If you want to use Tianmu to process a query on certain tables, you must define these tables as Tianmu tables. Otherwise, the query WILL NOT be routed to Tianmu. What’s more, the tables MUST be loaded into Tianmu.

- Query statements that have functions or operators unsupported by Tianmu cannot be executed by using Tianmu.

- The automatic transaction feature must be enabled. Otherwise, workloads will NOT be routed to Tianmu.

**[Query examples will be listed here later.]**

## <h2 id="2.4">Monitor System Status</h2>

To monitor system status, certain system variables must be configured, which are used to monitor:
- Whether Tianmu is enabled. The Tianmu engine can be enabled at two levels: global and session-level. 
- Status of Tianmu nodes
- Resource usage information of Tianmu, such as memory usage
- Status of load operations of Tianmu
- Buffer status of Tianmu
- ...
hustjieke commented 1 year ago

ACK.