vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.73k stars 2.1k forks source link

Bug Report: Incorrect results for multi-table delete query with foreign keys #15191

Closed GuptaManan100 closed 9 months ago

GuptaManan100 commented 9 months ago

Overview of the Issue

This bug has been caught by the fuzzer introduced in https://github.com/vitessio/vitess/pull/13980.

Consider the following tables in an unsharded keyspace that is running Vitess in a foreign key managed mode -

/*
 *                fk_t10
 *                   │
 * On Delete Cascade │
 * On Update Cascade │
 *                   │
 *                   ▼
 *                fk_t11──────────────────┐
 *                   │                    │
 *                   │                    │ On Delete Restrict
 * On Delete Cascade │                    │ On Update Restrict
 * On Update Cascade │                    │
 *                   │                    │
 *                   ▼                    ▼
 *                fk_t12               fk_t13
 */

create table fk_t10
(
    id bigint,
    col varchar(10),
    primary key (id),
    unique index(col)
) Engine = InnoDB;

create table fk_t11
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col),
    foreign key (col) references fk_t10(col) on delete cascade on update cascade
) Engine = InnoDB;

create table fk_t12
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col),
    foreign key (col) references fk_t11(col) on delete cascade on update cascade
) Engine = InnoDB;

create table fk_t13
(
    id bigint,
    col varchar(10),
    primary key (id),
    index(col),
    foreign key (col) references fk_t11(col) on delete restrict on update restrict
) Engine = InnoDB;

If we insert the following data in the tables -

insert /*+ SET_VAR(foreign_key_checks=0) */ into fk_t10 (id, col) values (1, '5'), (2, NULL), (3, NULL), (4, '4'), (6, '1'), (7, '2')
insert /*+ SET_VAR(foreign_key_checks=0) */ into fk_t11 (id, col) values (4, '1'), (5, '3'), (7, '22'), (8, '5'), (9, NULL), (10, '3')
insert /*+ SET_VAR(foreign_key_checks=0) */ into fk_t12 (id, col) values (2, NULL), (3, NULL), (4, '1'), (6, '6'), (8, NULL), (10, '1')
insert /*+ SET_VAR(foreign_key_checks=0) */ into fk_t13 (id, col) values (2, '1'), (5, '5'), (7, '5')

If we now try and run the multi delete query -

delete fk_t11 from fk_t11 join fk_t12 using (id) where fk_t11.id = 4

MySQL fails this query with the error Cannot delete or update a parent row: a foreign key constraint fails (ks.fk_t13, CONSTRAINTfk_t13_ibfk_1FOREIGN KEY (col) REFERENCESfk_t11(col) ON DELETE RESTRICT ON UPDATE RESTRICT) (errno 1451) (sqlstate 23000)

But Vitess lets the query succeed and ends up only deleting 2 rows from fk_t12 corresponding to the rows with id 4 and 10.

Reproduction Steps

Given in the description

Binary Version

main, release-19.0

Operating System and Environment details

-

Log Fragments

No response

GuptaManan100 commented 9 months ago

I looked at this failure, and I know what's happening. The plan that Vitess produces for this query is this -

{
  "OperatorType": "FkCascade",
  "Inputs": [
    {
   "InputName": "Selection",
      "OperatorType": "Route",
      "Variant": "Unsharded",
      "Keyspace": {
        "Name": "uks",
        "Sharded": false
      },
      "FieldQuery": "select fk_t11.col from fk_t11, fk_t12 where 1 != 1",
      "Query": "select fk_t11.col from fk_t11, fk_t12 where fk_t11.id = 4 and fk_t11.id = fk_t12.id for update",
      "Table": "fk_t11, fk_t12"
    },
    {
    "InputName": "CascadeChild-1",
      "OperatorType": "Delete",
      "Variant": "Unsharded",
      "Keyspace": {
        "Name": "uks",
        "Sharded": false
      },
      "TargetTabletType": "PRIMARY",
      "BvName": "fkc_vals",
      "Cols": [
        0
      ],
      "Query": "delete from fk_t12 where (col) in ::fkc_vals",
      "Table": "fk_t12"
    },
    {
    "InputName": "Parent",
      "OperatorType": "Delete",
      "Variant": "Unsharded",
      "Keyspace": {
        "Name": "uks",
        "Sharded": false
      },
      "TargetTabletType": "PRIMARY",
      "Query": "delete fk_t11 from fk_t11, fk_t12 where fk_t11.id = 4 and fk_t11.id = fk_t12.id",
      "Table": "fk_t11"
    }
  ]
}

From the outset, the plan looks fine, but when we try to see the exact order of execution, the problem becomes apparent.

The order of queries Vitess runs is this -

  1. select fk_t11.col from fk_t11, fk_t12 where fk_t11.id = 4 and fk_t11.id = fk_t12.id for update
  2. delete from fk_t12 where (col) in ::fkc_vals
  3. delete fk_t11 from fk_t11, fk_t12 where fk_t11.id = 4 and fk_t11.id = fk_t12.id

The problem is that once the 2nd query runs, (which is the cascade), the rows with id = 4 are gone from the fk_t12 table!! So now when we run the last query, there is no delete, since no row qualifies and therefore no failure!

This essentially shows up as though the delete query only ended up deleting 2 rows from fk_t12 and nothing else!