Closed yufan022 closed 1 year ago
Hi, thanks for you recommendation. This column will also enable many features so it need carefully design, we'll update our consideration in this issue and welcome to discuss with us 😄
Hi, thanks for you recommendation. This column will also enable many features so it need carefully design, we'll update our consideration in this issue and welcome to discuss with us 😄
Great. Is there a plan to support this feature? We will try to support it internally first. Could you give us some help and suggestions?
Hi, currently we doesn't have enough manpower for this feature, so we're really happy to see if we can work together to implement it.
Here are our suggestions:
genUpdateSQLs
and genDeleteSQLs
in below lines.)
this will avoid the added column from conficting with other feature.If you are willing to contribute this feature to DM repo, we maintainers can check bugs in implementation, maintain the logic with other features when code is growing, and give concrete and detailed explanation of internal stages. We are really look forward to your contribution.
I'll list some scenarios for this feature:
so we plan that adding a configration to let source-id (mysql instance id), database name, table name can be appended to records when migrating. The requirement is
PTAL if it's OK @yufan022
For shard merging replication, if upstream tables using AUTO INCREMENT primary key, it's highly likely to cause a conflict after merging, because we can't tell records (with auto-increment PK) of one shard from another.
The common practice in this scenario is that create the downstream table structure in advance, cancel the primary key constraint and use the extended column with the original primary key as the union primary key.
Scenario example:
# upstream
CREATE TABLE `order_xxx` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# downstream
CREATE TABLE `collection_order` (
`id` bigint(20) unsigned NOT NULL,
...
`s_table` varchar(64) NOT NULL,
primary key(`id`,`s_table`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
in order to support flexible usage such as query on specified "database name" or "database name + table name", each appended information must have its exclusive column. This also meets 1NF of database.
Extracting source / schema / table into specific columns of downstream tables will make querying or adding indexes more flexible, and the meaning of each column is clearer. Although this will increase unnecessary storage space and index, we think it is worth it. Users can set to extracting several columns, up to 3 columns at present.
user may need only a part of table name, for example, "us_west" out of "us_west_auth_log", so the configration should have a way to extract needed information
In the current design, because we are not sure how many complex scenes there are, we use regular expressions to adapt more scenarios.
Task configuration example:
routes:
order-route:
schema-pattern: "core"
table-pattern: "order_*"
target-schema: "core"
target-table: "collection_order"
extract-table:
table-regexp: "order_([0-9a-zA-Z]+)" # result: order_a -> a
target-column: "symbol"
extract-schema:
schema-regexp: "([0-9a-zA-Z]+)" # result: core -> core
target-column: "schema"
extract-source:
source-regexp: "mysql-replica-(.*)" # result: mysql-replica-01 -> 01
target-column: "source_s"
trade-route: # extraction is not enabled and should be compatible
schema-pattern: "core"
table-pattern: "trade_*"
target-schema: "core"
target-table: "collection_trade"
@lance6716 Thanks for the help, if there is anything unreasonable, welcome to discuss.
Feature Request
Is your feature request related to a problem? Please describe:
Hi, we are trying to use DM to merge the shared tables of AWS Aurora into the tidb cluster managed by tidb operator.
But our scene is a little special, our shared tables doesn't have shared key, we use table name to splite the table.
So if we merged source table
order_btcusdt; order_ethusdt;...
to target tableorder_collect
, We can't tell which orders are from btcusdt, and which are from ethusdt. Because there doesn't have shared column.Can DM extract the table name as an extended column of target table? Or is there a better solution?
thanks.
Describe the feature you'd like:
Merge shared tables without partition keys, extract the table name to the target table column
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy: