vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.18k stars 2.06k forks source link

RFC: Foreign Key Support in Vitess #12967

Open GuptaManan100 opened 1 year ago

GuptaManan100 commented 1 year ago

Introduction

This is an RFC for adding Foreign Key Support in Vitess.

Use Case

Scope of the project

Out of Scope

Schema

The following is the schema we will use for providing examples as we discuss the design of the foreign key support.

MySQL Schema ```mysql mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables; +---------------------------+ | Tables_in_foreign_key_rfc | +---------------------------+ | area | | contact | | customer | | orders | | product | +---------------------------+ 5 rows in set (0.00 sec) ``` ```mysql mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area; +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | area | CREATE TABLE `area` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, `zipcode` int DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact; +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | contact | CREATE TABLE `contact` ( `id` int NOT NULL, `contactnum` varchar(10) DEFAULT NULL, `customer_id` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `customer_id` (`customer_id`), CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer; +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | customer | CREATE TABLE `customer` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, `area_id` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `area_id` (`area_id`), CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders; +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | orders | CREATE TABLE `orders` ( `id` int DEFAULT NULL, `product_id` int DEFAULT NULL, `customer_id` int DEFAULT NULL, KEY `product_id` (`product_id`), KEY `customer_id` (`customer_id`), CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`), CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product; +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | product | CREATE TABLE `product` ( `id` int NOT NULL, `name` varchar(30) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci | +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) ```
erDiagram
    Area {
        id int PK
        name varchar
        zipcode int
    }

    Product {
        id int PK
        name varchar
    }

    Order {
        id int PK
        produce_id int FK
        customer_id int FK
    }

    Customer {
        id int PK
        name varchar
        area_id int FK
    }

    Contact {
        id int PK
        contact varchar
        customer_id int FK
    }

    Product ||--}o Order :""
    Customer ||--}o Order :""
    Customer ||--}o Contact :""
    Area ||--}o Customer :""

The data we insert for the examples that follow are -

delete from area;
delete from product;
delete from customer;
delete from orders;
delete from contact;
insert into product values ('1', 'Fan'), ('2', 'Cooler'), ('3', 'AC');
insert into area values(1, 'US', '521763'), (2, 'India', '432143');
insert into customer values(1, 'Manan', 2), (2, 'Andres', 1), (3, 'Harshit', 2);
insert into contact values (1, '897876876', 3), (2, '234123453', 1), (3, '789975234', 2), (4, '343214314', 1);
insert into orders values (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 2, 1), (4, 2, 3);

Design

This section dives into more specific details of what we intend to do to support foreign keys.

Basic Design

Planning INSERTs

Planning UPDATEs and DELETEs

RESTRICT/NO ACTION (default)

CASCADE

Planning REPLACE

Important Considerations

flowchart TD
    subgraph This is acceptable
    Table1
    col1[[col1]]
    col2[[col2]]
    Table1 --- col1
    Table1 --- col2

    Table2
    col3[[col3]]
    col4[[col4]]
    Table2 --- col3
    Table2 --- col4

    col1 -.-> col3
    col4 -.-> col2
    end

    subgraph This won't work
    Table3
    col5[[col1]]
    col6[[col2]]
    Table3 --- col5
    Table3 --- col6

    Table4
    col8[[col3]]
    col7[[col4]]
    Table4 --- col8
    Table4 --- col7

    col5 -.-> col8
    col8 -.-> col5
    end

Data structure to store FK constraints in VSchema

Schema tracking will give us a list of foreign key constraints as part of the SHOW CREATE TABLE output. We want to store this output in the VSchema struct in a form that gives us the best performance while planning.

We'll need to answer queries of the following sorts -

  1. For a given table, find all the foreign key constraints where it is a child. (Needed for planning INSERTs)
  2. For a given table, find all the foreign key constraints where it is a parent. (Needed for planning DELETE's andUPDATE`s)

The VSchema struct stores a map of KeyspaceSchema for each keyspace. Within a KeyspaceSchema we have a map of Table. We'll store the foreign key constraints inside this Table struct.

We'll add 2 more fields to the Table struct -

type Table struct {
    ...
    ParentFKs []*ForeignKeyConstraint
    ChildFKs []*ForeingKeyConstraint
}

Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.

The ForeignKeyConstraint struct would look something like this -

type ForeignKeyConstraint struct {
    ParentTable TableName
    ParentColumns []Columns
    ChildTable TableName
    ChildColumns []Columns
    OnDeleteAction int // This will be an enum
    OnUpdateAction int
}

Performance Improvements

  1. We can get MySQL to run the INSERT, UPDATE/DELETE (with Restrict) checks for us by using FOREIGN_KEY_CHECKS=1 on the connection for unsharded and single-sharded cases.
  2. For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and vreplication operations like Reshard and MoveTables work.

Phases

  1. Minimal Viable Product
    1. Support for basic INSERT, UPDATE and DELETE statements for unsharded.
    2. RESTRICT/NO ACTION, CASCADE, SET NULL mode for foreign key constraints will be supported.
    3. Support for ON DUPLICATE KEY UPDATE in INSERTs for unsharded.
    4. Support for REPLACE for unsharded.
  2. Phase two
    1. INSERT/UPDATE/DELETE ... (SELECT) (SELECT subquery in DMLs) for unsharded.
    2. Support for single-shard foreign key constraints.
  3. Phase three
    1. Cross-shard support.

Prerequisites

Tasks

shlomi-noach commented 1 year ago

General tracking issue: https://github.com/vitessio/vitess/issues/11975

derekperkins commented 1 year ago

We heavily use FKs, so I am still wrapping my head around this proposal. Here are my thoughts in no particular order:

Latency

Obviously handling this at vtgate will increase latency vs being enforced in MySQL. For the majority of our use cases, I don't think we would be willing to pay that price, as we're just enforcing shard-local keys.

For cross-shard purposes, today we vreplicate PK tables into destination shards for critical FK relationships, but I do see where this would be a win for larger tables where it would be size/cost prohibitive to copy those around, and we would opt in for some of those

FOREIGN_KEY_CHECKS

We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections

This would be a complete non-starter for us. I would much rather see a separate connection pool for this

Cascade

We don't use this at all today, so no preferences about Vitess support

foreign_key_mode

We’ll add a new mode for foreign_key_mode in VTGate called Vitess Managed

Following my above points, I would prefer to choose the mode at a per table level, instead of at the entire vtgate/vttablet level. I understand that increases the complexity, but if I had to choose, I would stay with MySQL enforced FKs over Vitess enforced

shlomi-noach commented 1 year ago

We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections

This would be a complete non-starter for us. I would much rather see a separate connection pool for this

@derekperkins this is driven by the idea that vitess would own validating and cascading foreign key writes. Enabling FOREIGN_KEY_CHECKS on MySQL means that on top of vitess already validating the relationship, and already e.g. cascading a DELETE, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful.

derekperkins commented 1 year ago

Enabling FOREIGN_KEY_CHECKS on MySQL means that on top of vitess already validating the relationship, and already e.g. cascading a DELETE, so would MySQL. Which means even more lookups on child/parent tables, which is wasteful

I totally understand that point, and am fine with this choice if I have opted into Vitess FK mode. The crux of it for me is that I would only opt into Vitess FK mode for a small subset of tables at most, and thus wouldn't be ok with FKs being disabled at the MySQL level for all connections.

At a practical level, I would think this could be supported, given my preference for a per table opt in. When the DML is parsed at the vtgate level, it checks to see if the table is Vitess managed or not. If Vitess managed, handle it as described in the RFC, and at the vttablet level, use the FK disabled pool. If not Vitess managed, use the normal pool with FKs enabled.

harshit-gangal commented 1 year ago

@derekperkins Vitess knows about the schema and vschema so it knows when the foreign key constraint is applied at the shard level and when it is going cross-shard. So, the query planner can take care of it. External information would not be needed to do the optimization for the case you highlighted above.

We want to reduce the operational burden here.

GuptaManan100 commented 1 year ago

Me, @harshit-gangal and @shlomi-noach had a discussion today and we realised that it might be better to keep the information about how to deal with foreign keys as a key-space level configuration instead of a flag on vtgates. There are 2 reasons for this -

  1. It doesn't make sense for some vtgates to manage foreign keys while some don't. Similarly, the configuration has to be consistent with vttablets. We avoid all of this misconfiguration if we put it into the topo-server
  2. It gives the user more control on which keyspaces they want Vitess to manage the foreign keys and on which keyspaces they don't want Vitess to do anything. This will also address @derekperkins's point to some extant. I think it would be too much to do control Vitess/MySQL managed foreign keys on a table level, but doing it on a keyspace level looks like a good idea to me
GuptaManan100 commented 1 year ago

A couple of updates. We have reworked the phases of the project and we'll store the foreign key mode in the VSchema instead of storing it in the keyspace record.

shlomi-noach commented 10 months ago

Proposed endtoend testing for FOREIGN KEY support: https://github.com/vitessio/vitess/pull/13799 (right now the test fails because support is still in progress and incomplete)

GuptaManan100 commented 10 months ago

Today, me and @harshit-gangal ran into 2 problems, one of which we have been able to solve. The other we have deferred...

  1. UPDATE with SET NULL constraints - In this situation, MySQL sets NULL on all the children columns. One thing that is peculiar though, is that if the parent columns aren't actually updated, then we don't set NULLs! So, if a user has a foreign key constraint from t1(c1) to t2(c2) of the ON UPDATE SET NULL variant, and they run update t1 set c1 = 1 where id = 100. If the value of c1 is already 1 in the table for id 100, then we don't actually propogate the NULLs! We have handled this case in https://github.com/vitessio/vitess/pull/13823 by changing the update query in the child table slightly. Now, after running select c1 from t1 where id = 100, the constructed query for the child update looks like - update t2 set c2 = NULL where (c1) IN ((<output from select>)) AND (c1) NOT IN ((1))

  2. UPDATE with CASCADE constraints - When we update the child table, then the query fails if the parent table doesn't already have the value that is being set in the update. So, we need to actually run the children's updates with FOREIGN_KEY_CHECKS=0, but this means that we have to do the verification of correctness of the foreign keys on the vtgate level. As part of https://github.com/vitessio/vitess/pull/13823, we haven't added foreign key verification, as a result, only updates that set the column to a value such that it already exists in the table and the child update doesn't fail works.

shlomi-noach commented 10 months ago

VTGate should ignore foreign key constraints where one (or both) of the related tables is an internal Vitess table: https://github.com/vitessio/vitess/issues/13894

harshit-gangal commented 7 months ago

Pending Task:

Addon:

Next Set of Support: