Open maxenglander opened 1 week ago
@maxenglander do you yet have any clear idea on how this would be consumed and exposed in the connector? I'm wondering if this might be enough:
[type:DDL statement:"CREATE TABLE `customer` (\n `customer_id` bigint NOT NULL AUTO_INCREMENT,\n `email` varbinary(128) DEFAULT NULL,\n PRIMARY KEY (`customer_id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci" type:DDL statement:"CREATE TABLE `corder` (\n `order_id` bigint NOT NULL AUTO_INCREMENT,\n `customer_id` bigint DEFAULT NULL,\n `sku` varbinary(128) DEFAULT NULL,\n `price` bigint DEFAULT NULL,\n PRIMARY KEY (`order_id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci" type:DDL statement:"CREATE TABLE `product` (\n `sku` varbinary(128) NOT NULL,\n `description` varbinary(128) DEFAULT NULL,\n `price` bigint DEFAULT NULL,\n PRIMARY KEY (`sku`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"]
[type:BEGIN keyspace:"commerce" shard:"0" type:FIELD field_event:{table_name:"commerce.customer" fields:{name:"customer_id" type:INT64 table:"customer" org_table:"customer" database:"vt_commerce" org_name:"customer_id" column_length:20 charset:63 flags:49667 column_type:"bigint"} fields:{name:"email" type:VARBINARY table:"customer" org_table:"customer" database:"vt_commerce" org_name:"email" column_length:128 charset:63 flags:128 column_type:"varbinary(128)"} keyspace:"commerce" shard:"0" enum_set_string_values:true} keyspace:"commerce" shard:"0"]
[type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/2f92cad2-9b91-11ef-8d3d-334bde7bcc8c:1-45"}} keyspace:"commerce" shard:"0"]
[type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:16 values:"1alice@domain.com"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 values:"2bob@domain.com"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:18 values:"3charlie@domain.com"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 values:"4dan@domain.com"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:ROW row_event:{table_name:"commerce.customer" row_changes:{after:{lengths:1 lengths:14 values:"5eve@domain.com"}} keyspace:"commerce" shard:"0"} keyspace:"commerce" shard:"0" type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/2f92cad2-9b91-11ef-8d3d-334bde7bcc8c:1-45" table_p_ks:{table_name:"customer" lastpk:{fields:{name:"customer_id" type:INT64 charset:63 flags:49667} rows:{lengths:1 values:"5"}}}}} keyspace:"commerce" shard:"0" type:COMMIT keyspace:"commerce" shard:"0"]
[type:BEGIN keyspace:"commerce" shard:"0" type:VGTID vgtid:{shard_gtids:{keyspace:"commerce" shard:"0" gtid:"MySQL56/2f92cad2-9b91-11ef-8d3d-334bde7bcc8c:1-45"}} keyspace:"commerce" shard:"0" type:COMMIT keyspace:"commerce" shard:"0"]
[type:COPY_COMPLETED keyspace:"commerce" shard:"0" type:COPY_COMPLETED]
That's the initial output when starting a vstream in copy mode for the commerce
keyspace in the local examples — using the example vstream client, with streamCustomer := false
.
@mattlord i wouldn't say i have a very clear idea, but i don't see why this wouldn't work just fine.
Is it up to the client to decide how to consume this event? If the tables were already created, applying the DDL event would cause an error. We are not sending IF NOT EXISTS
in the statement.
I would agree that it's up to the client to decide how to use this event.
This is related to a PR I am working on for debezium vitess connector that gets this info via issuing describe table queries at startup. If we can send create table statements over grpc by default that would simplify the code a decent bit, so I'm in favor of this. However, Debezium has different notions of snapshots eg sometimes it doesn't want all the data it just wants the schema (ie only the create table statements, followed by real-time data). Can we add a way for clients to initiate a "schema-only" snapshot? This would be a requirement from Debezium side to make this useful (otherwise we tightly couple data & schema info, and would need to fall back to issuing describe table statements directly to decouple them, in which case the code is not simplified).
The way Debezium-mysql-connector works is it issues these describe table statements and then parses the DDL statements to determine the schema of the tables (including default values). The PR I mentioned before would reuse that parser to get the same info. One of the key benefits of using the debezium-vitess-connector over the mysql-connector is that that it doesn't require a binlog parser and vitess parses the binlog before sending the info as grpc event. By sending create table statements we push complexity onto debezium to parse them. An alternative approach is we expand the FIELD event or add another event type that can describe the complete schema of a table (type, nullability, default value) in a structured grpc event format (not just the SQL string). Right now field events have the types, but lack this rich information. What are ya'll's thoughts on something like this? The FIELD event contains almost enough info to construct a complete picture of the schema, we just need this additional information (nullability, default value). This also would keep the API consistent and mean less parsing/complexity for clients (otherwise each may have to maintain its own DDL parser to use the statements).
One other thing: I noticed you all recently started this debezium-connector-planetscale fork. I saw some useful contribution in VitessType class (support BIT type). Do you all plan to contribute back to debezium-connector-vitess? We definitely welcome contributions and would be grateful for them.
hey @twthorn i'm the current maintainer of the debezium for planetscale fork. i'll reach out to you through another channel and we can chat about contributions!
Feature Description
The output of the VStream copy phase does not include certain table/column metadata. For example it does not include column
DEFAULT
information.It would be great if there were a way to access that information in the VStream copy phase output, along with any other column/table metadata that could be useful.
From discussions with @mattlord he mentioned that it would make sense to have the VStream emit
CREATE TABLE
statements as part of the VStream copy phase, similar to howALTER
statements are included in the output of the streaming phase.Use Case(s)
For example, people using the Debezium MySQL connector expect to see column default information in the connector output. The Vitess connector doesn't include that information, however, which can make it more difficult for people to migrate from MySQL to Vitess.