Open GuptaManan100 opened 1 year ago
General tracking issue: https://github.com/vitessio/vitess/issues/11975
We heavily use FKs, so I am still wrapping my head around this proposal. Here are my thoughts in no particular order:
Obviously handling this at vtgate will increase latency vs being enforced in MySQL. For the majority of our use cases, I don't think we would be willing to pay that price, as we're just enforcing shard-local keys.
For cross-shard purposes, today we vreplicate PK tables into destination shards for critical FK relationships, but I do see where this would be a win for larger tables where it would be size/cost prohibitive to copy those around, and we would opt in for some of those
We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections
This would be a complete non-starter for us. I would much rather see a separate connection pool for this
We don't use this at all today, so no preferences about Vitess support
We’ll add a new mode for foreign_key_mode in VTGate called Vitess Managed
Following my above points, I would prefer to choose the mode at a per table level, instead of at the entire vtgate/vttablet level. I understand that increases the complexity, but if I had to choose, I would stay with MySQL enforced FKs over Vitess enforced
We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections
This would be a complete non-starter for us. I would much rather see a separate connection pool for this
@derekperkins this is driven by the idea that vitess
would own validating and cascading foreign key writes. Enabling FOREIGN_KEY_CHECKS
on MySQL means that on top of vitess
already validating the relationship, and already e.g. cascading a DELETE
, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful.
Enabling FOREIGN_KEY_CHECKS on MySQL means that on top of vitess already validating the relationship, and already e.g. cascading a DELETE, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful
I totally understand that point, and am fine with this choice if I have opted into Vitess FK mode. The crux of it for me is that I would only opt into Vitess FK mode for a small subset of tables at most, and thus wouldn't be ok with FKs being disabled at the MySQL level for all connections.
At a practical level, I would think this could be supported, given my preference for a per table opt in. When the DML is parsed at the vtgate level, it checks to see if the table is Vitess managed or not. If Vitess managed, handle it as described in the RFC, and at the vttablet level, use the FK disabled pool. If not Vitess managed, use the normal pool with FKs enabled.
@derekperkins Vitess knows about the schema and vschema so it knows when the foreign key constraint is applied at the shard level and when it is going cross-shard. So, the query planner can take care of it. External information would not be needed to do the optimization for the case you highlighted above.
We want to reduce the operational burden here.
Me, @harshit-gangal and @shlomi-noach had a discussion today and we realised that it might be better to keep the information about how to deal with foreign keys as a key-space level configuration instead of a flag on vtgates. There are 2 reasons for this -
A couple of updates. We have reworked the phases of the project and we'll store the foreign key mode in the VSchema instead of storing it in the keyspace record.
Proposed endtoend
testing for FOREIGN KEY support: https://github.com/vitessio/vitess/pull/13799 (right now the test fails because support is still in progress and incomplete)
Today, me and @harshit-gangal ran into 2 problems, one of which we have been able to solve. The other we have deferred...
UPDATE
with SET NULL
constraints -
In this situation, MySQL sets NULL on all the children columns. One thing that is peculiar though, is that if the parent columns aren't actually updated, then we don't set NULLs!
So, if a user has a foreign key constraint from t1(c1) to t2(c2) of the ON UPDATE SET NULL
variant, and they run
update t1 set c1 = 1 where id = 100
. If the value of c1 is already 1 in the table for id 100, then we don't actually propogate the NULLs!
We have handled this case in https://github.com/vitessio/vitess/pull/13823 by changing the update query in the child table slightly.
Now, after running select c1 from t1 where id = 100
, the constructed query for the child update looks like - update t2 set c2 = NULL where (c1) IN ((<output from select>)) AND (c1) NOT IN ((1))
UPDATE
with CASCADE
constraints -
When we update the child table, then the query fails if the parent table doesn't already have the value that is being set in the update.
So, we need to actually run the children's updates with FOREIGN_KEY_CHECKS=0
, but this means that we have to do the verification of correctness of the foreign keys on the vtgate level. As part of https://github.com/vitessio/vitess/pull/13823, we haven't added foreign key verification, as a result, only updates that set the column to a value such that it already exists in the table and the child update doesn't fail works.
VTGate should ignore foreign key constraints where one (or both) of the related tables is an internal Vitess table: https://github.com/vitessio/vitess/issues/13894
Pending Task:
NOWAIT
for tables involving unique keys and not all https://github.com/vitessio/vitess/pull/14772Addon:
Next Set of Support:
Introduction
This is an RFC for adding Foreign Key Support in Vitess.
Use Case
Scope of the project
Out of Scope
Schema
The following is the schema we will use for providing examples as we discuss the design of the foreign key support.
MySQL Schema ```mysql mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables; +---------------------------+ | Tables_in_foreign_key_rfc | +---------------------------+ | area | | contact | | customer | | orders | | product | +---------------------------+ 5 rows in set (0.00 sec) ```
```mysql mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area; +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | area | CREATE TABLE `area` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, `zipcode` int DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact; +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | contact | CREATE TABLE `contact` ( `id` int NOT NULL, `contactnum` varchar(10) DEFAULT NULL, `customer_id` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `customer_id` (`customer_id`), CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer; +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | customer | CREATE TABLE `customer` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, `area_id` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `area_id` (`area_id`), CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders; +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | orders | CREATE TABLE `orders` ( `id` int DEFAULT NULL, `product_id` int DEFAULT NULL, `customer_id` int DEFAULT NULL, KEY `product_id` (`product_id`), KEY `customer_id` (`customer_id`), CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`), CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product; +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | product | CREATE TABLE `product` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) ```The data we insert for the examples that follow are -
Design
This section dives into more specific details of what we intend to do to support foreign keys.
Basic Design
FOREIGN_KEY_CHECKS=0
on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.SET DEFAULT
).ON DUPLICATE KEY UPDATE
support).foreign_key_mode
is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and putForeignKeyMode
as a VSchema configuration.Planning
INSERTs
ON DUPLICATE KEY UPDATE
Example
For example, if the user was to execute
insert into orders (id, product_id, customer_id) values (4, :a, :b);
, the set of steps that Vitess would take -START TRANSACTION
SELECT 1 FROM product WHERE ID = :a FOR SHARE
,SELECT 1 FROM customer WHERE ID = :b FOR SHARE
COMMIT
Planning
UPDATEs
andDELETEs
UPDATE
andDELETE
depends on the referential actions that they are configured with. Let's dive into each one that MySQL allowsRESTRICT
/NO ACTION
(default)SELECT
validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted.SELECT
query and use these results to generate the validation queries.DELETE FROM customer
which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and theSELECT
validation query it executes might be extremely large too. This could lead to OOMs. We can add aLIMIT
clause to the equivalentSELECT
statement of theDELETE
and reject such mass updates/deletes if we get more results than theLIMIT
allows.Example
For example, if the user was to execute
DELETE FROM customer WHERE area_id = 2;
, then Vitess would need to take the following steps -START TRANSACTION
DELETE
query into aSELECT
with the sameWHERE
clause. So Vitess would executeSELECT id FROM customer WHERE area_id = 2;
. We would get back the following result -customer
table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them -SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
andSELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
.COMMIT
.CASCADE
RESTRICT
case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted.SELECT
query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.UPDATE
/DELETE
for the children rows in the same transaction. Do this until no further cascades are required.ROLLBACK
.LIMIT
clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.SET NULL
SET NULL
is very similar toCASCADE
. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.SET DEFAULT
SET NULL
. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.Planning
REPLACE
REPLACE
statements are only supported in unsharded mode.REPLACE
we'll plan the DELETE and INSERT and execute aSELECT
query to decide if aDELETE
is necessary.Important Considerations
INSERT
/UPDATE
/DELETE
only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only beSELECT... FOR SHARE
statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on theCOMMIT
succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.FOREIGN_KEY_CHECKS
to 0).Data structure to store FK constraints in VSchema
Schema tracking will give us a list of foreign key constraints as part of the
SHOW CREATE TABLE
output. We want to store this output in theVSchema
struct in a form that gives us the best performance while planning.We'll need to answer queries of the following sorts -
INSERT
s)DELETE's and
UPDATE`s)The
VSchema
struct stores a map ofKeyspaceSchema
for each keyspace. Within aKeyspaceSchema
we have a map ofTable
. We'll store the foreign key constraints inside thisTable
struct.We'll add 2 more fields to the
Table
struct -Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.
The
ForeignKeyConstraint
struct would look something like this -Performance Improvements
INSERT
,UPDATE/DELETE
(with Restrict) checks for us by usingFOREIGN_KEY_CHECKS=1
on the connection for unsharded and single-sharded cases.Phases
INSERT
,UPDATE
andDELETE
statements for unsharded.RESTRICT
/NO ACTION
,CASCADE
,SET NULL
mode for foreign key constraints will be supported.ON DUPLICATE KEY UPDATE
in INSERTs for unsharded.REPLACE
for unsharded.INSERT/UPDATE/DELETE ... (SELECT)
(SELECT subquery in DMLs) for unsharded.Prerequisites
INSERT
planning in Gen4. https://github.com/vitessio/vitess/pull/12934Tasks