pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.11k stars 5.83k forks source link

Support incremental commit of INSERT...SELECT #18038

Open kolbe opened 4 years ago

kolbe commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

TiDB does not handle large transactions well.

INSERT...SELECT is a common and generally inexpensive mechanism used by MySQL applications to insert a large amount of data from one table (or query result) into another table. This does not work well in TiDB if there are very many rows in the transaction.

Describe the feature you'd like:

A solution to support large INSERT...SELECT transactions in TiDB could be to automatically commit the running transaction periodically so that the transaction does not grow to be too large.

I'd like to see support for this in 2 forms:

1) A per-statement flag that could be added to the SQL itself, like this:

`INSERT INTO tbl1 SELECT * from tbl2 /*+ TIDB_INCREMENTAL_COMMIT(10000) */`

2) A system variable that can be set either as a SESSION or GLOBAL variable (with a corresponding configuration variable to set the GLOBAL value on startup):

`SET GLOBAL tidb_incremental_commit=10000;`

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Users migrating from MySQL might run into big problems in TiDB because they're using this idiom. Adding support for this could make it easier for those users to migrate their applications to TiDB.

If this feature is implemented, it should be considered whether it applies to all statements or only specific types of statements (LOAD DATA, INSERT...SELECT, etc.).

zz-jason commented 4 years ago

A solution to support large INSERT...SELECT transactions in TiDB could be to automatically commit the running transaction periodically so that the transaction does not grow to be too large.

"commit the running transaction periodically" would break the atomic property of the original transaction. Is that acceptable by the user? Seems it's the same with https://github.com/pingcap/tidb/issues/18028. Although it's issue summary is about "performance", but the original problem that https://github.com/pingcap/tidb/issues/18028 wants to solve is similar to this issue.

Is https://github.com/pingcap/tidb/issues/18028#issuecomment-657503430 a workaround for users?

zz-jason commented 4 years ago

For now:

ghost commented 4 years ago
  • A per-statement flag that could be added to the SQL itself, like this: INSERT INTO tbl1 SELECT * from tbl2 /*+ TIDB_INCREMENTAL_COMMIT(10000) */

As an alternative, may I suggest adding the SET_VAR hint. It helps introduce per-statement scoped variables without needing additional syntax.

Edit: See https://github.com/pingcap/tidb/issues/18748 for SET_VAR hint.

scsldb commented 4 years ago

We need to investigate the user scenario in depth, and then make a decision after the research is completed.

ghost commented 4 years ago

Note that LOAD DATA already has this behavior, although no longer by default. See: https://github.com/pingcap/tidb/pull/18807

The setting to make it incremental commit is called tidb_dml_batch_size.

ghost commented 4 years ago

I hit this today while trying to create examples for partitioning on moderate-sized tables. Because TiDB does not support adding partitioning, or removing partitioning from existing tables, it requires an export and then a reload:

mysql> ALTER TABLE t1 PARTITION BY RANGE ( YEAR(`start_date`) ) (
    ->   PARTITION `p2010` VALUES LESS THAN (2011),
    ->   PARTITION `p2011` VALUES LESS THAN (2012),
    ->   PARTITION `p2012` VALUES LESS THAN (2013),
    ->   PARTITION `p2013` VALUES LESS THAN (2014),
    ->   PARTITION `p2014` VALUES LESS THAN (2015),
    ->   PARTITION `p2015` VALUES LESS THAN (2016),
    ->   PARTITION `p2016` VALUES LESS THAN (2017),
    ->   PARTITION `p2017` VALUES LESS THAN (2018),
    ->   PARTITION `p2018` VALUES LESS THAN (2019),
    ->   PARTITION `p2019` VALUES LESS THAN (2020),
    ->   PARTITION `pmax` VALUES LESS THAN (MAXVALUE)
    -> );
ERROR 1105 (HY000): alter table partition is unsupported

mysql> ALTER TABLE trips_partitioned REMOVE PARTITIONING;
ERROR 8200 (HY000): Unsupported remove partitioning

It is not user-friendly to show examples that require a local dumper as an in-between step. It is (obviously) optimal if TiDB supports all DDL variants, but anticipating that this could take some time, incremental commit is important because it provides a viable workaround.

ghost commented 4 years ago

Here is another use case:

This feature combines very well with a time-travel query ( https://github.com/pingcap/tidb/issues/18672 ), so you can effectively INSERT INTO tmplocation SELECT * FROM maintable AS OF SYSTEM TYPE '2020-10-01 10:00:00'.

Since there is already a PR for the SET_VAR hint, and LOAD DATA already supports incremental commit with tidb_dml_batch_size, my suggestion is that this could be implemented by extending tidb_dml_batch_size to be settable with SET_VAR and apply to INSERT .. SELECT.

@zz-jason @jackysp what do you think?

jackysp commented 4 years ago

tidb_dml_batch_size can already apply to insert .. select, as long as https://github.com/pingcap/tidb/blob/master/config/config.toml.example#L65 is turned true and set tidb_batch_insert = 1, you can commit transactions in batches. However, I am not sure whether the set_var hint can allow writing when using tidb_snapshot. @nullnotnil

ghost commented 4 years ago

@jackysp Thanks! I can confirm it works (kind of):

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
 pk VARBINARY(36) NOT NULL PRIMARY KEY,
 b BIGINT NOT NULL,
 c BIGINT NOT NULL,
 pad VARBINARY(2048),
 INDEX (b),
 INDEX (c)
);
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM dual;
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;

-- This should insert 1000 rows (10^3)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;

-- 100K rows (fails by default)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b LIMIT 100000;

SET tidb_dml_batch_size = 20000;
SET tidb_batch_insert = 1;
-- 100K rows (now works)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b LIMIT 100000;

I don't think set_var + tidb_snapshot can work together here, since tidb_snapshot implies read-only and set_var scope is statement (not query block). We will need to add full support for time-travel query.

I also noted that if I try to simulate a very large table (inserting 1010^3 rows) I could trigger out of memory controls. ~But this also happened when running a SELECT stmt and piping to /dev/null, so I will create a separate issue~ Edit: It was the client OOM'ing and not the server. If I add the --quick option the memory is stable. I am going to fork this to a separate bug report.

kennytm commented 3 years ago

(note that the hint should be placed after the INSERT keyword, not at the end of the statement.)

INSERT /*+ SET_VAR(...) */ INTO tbl1 SELECT * from tbl2;
ghost commented 3 years ago

I have a proposal for how to address this issue. I miss some of the context about why certain feature-flags exist, so I appreciate input:

  1. The config option enable-batch-dml is enabled by default.
  2. The sysvar tidb_batch_insert is deprecated, and implied by a non-zero value for tidb_dml_batch_size.
  3. The sysvar tidb_dml_batch_size is changed to be SET_VAR hintable.
  4. ~https://github.com/pingcap/tidb/issues/20597 is fixed.~ (Done)

Items 1 and 2 make INSERT.. SELECT behave the same as LOAD DATA does, which (inconsistently) does not depend on these two settings.

Mini256 commented 7 months ago

Have been resolved by non-transactional-dml?