yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] Deferred FK constraints don't work properly #13009

Closed bllewell closed 1 year ago

bllewell commented 2 years ago

Jira Link: DB-2726

issue-13009.zip

Description

This use case:

the optional (or mandatory) one-to-one relationship

is a text-book scenario that calls for a deferrable FK constraint. An implementation either in pure SQL or in PL/pgSQL fails in YB. Each works fine in vanilla PG.

Environment

I spotted the bug using YB-2.13.0.1 on my laptop (running macOS Big Sur Version 11.6.6.) I'm using a freshly-created single node cluster.

Related Github Issues

#4700 [YSQL] Support DEFERRABLE constraintsopen

#1709 [YSQL] DEFERRABLE constraint should be supported open

#3995 [YSQL] Support deferrable for foreign key constraintsclosed 6-Jul-2020 with reference to this Gira master issue that keeps track of all deferrable constraint issues:

DB-2213 [YSQL] Support DEFERRABLE constraints

The optional (or mandatory) one-to-one relationship

The optional and mandatory flavors meet different use cases.

The canonical implementation has a pair of FK constraints, one in each direction, between the two tables. This ensures, declaratively, that the relationship is indeed one-to-one.

The scheme implies a bootstrap proposition: you can't insert a new mains row unless its extras partner is in place; and you can't insert a new extras row unless its mains partner is in place.

In the mandatory flavor of the use case, the only way to sidestep this dilemma, is to use a single transaction and to take advantage of a deferred FK constraint on the mains table. Even in the optional flavor, proper practice demands using a single transaction to implement the business function « create a new row which has an "extra facts" part ».

Proper practice, again, requires encapsulating a multi-statement business transaction in a PL/pgSQL subprogram with an exception section that will translate low-level exceptions (like _uniqueviolation) into business-purpose oriented vocabulary (like "this nickname is already taken"). However, you cannot handle an exception that arises at commit time in PL/pgSQL. The "set constraints all immediate" SQL statement comes to the rescue here. (Because the bootstrap dilemma arises only on insert, there is no possibility of a race condition—so it's safe to anticipate commit in this way.)

Usual practice also requires that:

These three SQL statements are all that's needed to ensure that the business rules are enforced, for the mandatory one-to-one case using generated surrogate PKs—and it does so declaratively.

create table mains(
  k           int  primary key
                   generated always as identity,  -- Generated PK
  main_fact   text unique
                   not null,
  extras_k    int  unique
                           );                     -- Optional 1:1

create table  extras(
  k           int  primary key
                   generated always as identity,  -- Generated PK
  extra_fact  text unique
                   not null,
  mains_k     int  unique
                   not null,

  constraint extras_fk
    foreign key(mains_k)
    references mains(k)
    match full on delete cascade
    not deferrable);

alter table mains add
  constraint mains_fk
    foreign key(extras_k)
    references extras(k)
    match full on delete restrict
    initially deferred;

(This script shows just one of the four variants that are explained in the Code organization section below.)

What the YSQL doc says

Notice that the CREATE TABLE doc in the stable version that's current at the time of creating this comment (v2.12) shows the full syntax specification for deferrability thus:

[ DEFERRABLE | NOT DEFERRABLE ] 
[ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

and it says this in the Semantics subsection:

Constraints can be deferred using the DEFERRABLE clause. Currently, only foreign key constraints can be deferred in YugabyteDB... In the case of deferrable constraints, the checking of the constraint can be postponed until the end of the transaction... Constraints marked as INITIALLY DEFERRED will be checked at the end of the transaction.

This is at odds with what this issue shows. Deferred FK constraints show buggy behavior.

Testcase

It turns out that four distinct degrees of freedom determine the outcome. Each has two choices.

This code tests all sixteen of the combinations. The outcomes are easily summarized thus:

The YB faults occur only with the generated PK implementation. This differs from the implementation where the PKs are set explicitly, in that in the generated case, the inserts into each of the mains and extras tables use the returning clause to discover the generated PK values. The faulty cases are summarized thus:

               |  pure SQL                            |  PG/pgSQL        |
--------------------------------------------------------------------------
 optional 1:1  |  OK                                  |  faulty (crash)  |
---------------|----------------------------------------------------------
mandatory 1:1  |  faulty (nonsensical error message)  |  faulty (crash)  |
--------------------------------------------------------------------------

The error messages are reproduced below.

Code organization

The attached issue-13009.zip contains this file hierarchy:

optional-and-mandatory-one-to-one
├── 10-set-up-optional-with-explicit-pk.sql
├── 11-optional-with-explicit-pk-good
│   ├── 1-pure-sql.sql
│   ├── 2-plpgsql.sql
│   └── create-plpgsql-code.sql
├── 20-set-up-mandatory-with-explicit-pk.sql
├── 21-mandatory-with-explicit-pk-good
│   ├── 1-pure-sql.sql
│   ├── 2-plpgsql.sql
│   └── create-plpgsql-code.sql
├── 30-set-up-optional-with-generated-pk.sql
├── 31-optional-with-generated-pk-bad-only-in-plpgsql
│   ├── 1-pure-sql.sql
│   ├── 2-plpgsql.sql
│   └── create-plpgsql-code.sql
├── 40-set-up-mandatory-with-generated-pk.sql
├── 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
│   ├── 1-pure-sql.sql
│   ├── 2-plpgsql.sql
│   └── create-plpgsql-code.sql
├── do-mains-inner-join-extras.sql
└── do-mains-outer-join-extras.sql

Each of the combinations of:

with

has its own appropriately named directory. And each has its own correspondingly-named script to create the tables. Use your favorite GUI diff tool to compare them pairwise. They differ only on the lines marked in the optional-generated script shown above at the three lines marked with trailing comments.

The directory for each of the four table creation variants has a set of three files with the same names. Use the GUI diff tool, again, to compare each same-named pair across the directories.

1-pure-sql.sql

This invokes the appropriate set-up script and inserts a few rows within a transaction bounded by start transaction and commit. The method is appropriate to the variant that the directory name expresses.

Here, the\gset metacommand is used to store values from the returning clause as psql "substitution directives"—then to be accessed using the colon prefix notation.

create-plpgsql-code.sql

This creates a _doinserts() procedure that, for each variant, does exactly what its pure SQL partner does. But the coding is less obscure because ordinary PG/plSQL variables are used instead of psql colon-prefixed substitution directives.

It also creates a _deleteoutcome() function that, as just a sanity test, confirms that decorating the mains FK with "on delete restrict" and decorating the extras FK with "on delete cascade" meets the business requirements.

2-plpgsql.sql

This invokes the appropriate set-up script, invokes create-plpgsql-code.sql, and then invokes _doinserts() a few times and then _deleteoutcome() a few times.

Vanilla PG results

Run the eight scripts (1-pure-sql.sql and 2-plpgsql.sql on each of the four directories using vanilla PG to produce the following reference output.

11-optional-with-explicit-pk-good
    For 1-pure-sql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 0 | main-0    | 
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    For 2-plpgsql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | 
 2 | main-2    | extra-2
 3 | main-3    | extra-3

 "foreign_key_violation" handled as expected

 k | main_fact | extra_fact 
---+-----------+------------
 3 | main-3    | extra-3
21-mandatory-with-explicit-pk-good
    For 1-pure-sql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    For 2-plpgsql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

 "foreign_key_violation" handled as expected

 k | main_fact | extra_fact 
---+-----------+------------
 3 | main-3    | extra-3
31-optional-with-generated-pk-bad-only-in-plpgsql
    For 1-pure-sql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-0    | 
 2 | main-1    | extra-1
 3 | main-2    | extra-2
 4 | main-3    | extra-3

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    For 2-plpgsql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 insert OK

 insert OK

 insert OK

 insert OK

 foreign_key_violation on insert

 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-0    | 
 2 | main-1    | extra-1
 3 | main-2    | extra-2
 4 | main-3    | extra-3

 delete OK

 delete OK

 foreign_key_violation on delete

 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-0    | 
 4 | main-3    | extra-3
41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
    For 1-pure-sql.sql
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    For 2-plpgsql.sql
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 insert OK

 insert OK

 insert OK

 foreign_key_violation on insert

 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

 delete OK

 delete OK

 foreign_key_violation on delete

 k | main_fact | extra_fact 
---+-----------+------------
 3 | main-3    | extra-3

Yugabyte DB results

Now run the eight scripts (1-pure-sql.sql and 2-plpgsql.sql on each of the four directories using YugabyteDB. As mentioned, five of the scripts produce identical results in YB as in PG—so they aren't reproduced here. Only the results of the three faulty scripts are shown.

31-optional-with-generated-pk-bad-only-in-plpgsql
    For 2-plpgsql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
ERROR:      XX000: could not find block containing chunk 0x11206f028
LOCATION:   AllocSetFree, aset.c:1026
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  connection to server was lost

Step through 2-plpgsql.sql by hand. The error occurs here:

select insert_outcome('main-1', 'extra-1', 'good')  /*** Fails here on YB with fatal crash ***/;

This is the first invocation of _insertoutcome() that uses two select statements with the returning clause and then update to the mains table using the returned values.

41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
    For 1-pure-sql.sql
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 k | main_fact | extra_fact 
---+-----------+------------
 1 | main-1    | extra-1
 2 | main-2    | extra-2
 3 | main-3    | extra-3

So far so good. But when you step through 1-pure-sql.sql by hand, you'll see that the inner join query to show the results is invoked before the commit that ends the transaction. When you issue commit, you'll see this error:

ERROR:   23503: insert or update on table "mains" violates foreign key constraint "mains_fk"
DETAIL:  Key (extras_k)=(0) is not present in table "extras".
SCHEMA NAME:  u1
TABLE NAME:  mains
CONSTRAINT NAME:  mains_fk
LOCATION:  ri_ReportViolation, ri_triggers.c:2887

This is clearly at odds with what the inner join query showed just before you issued commit.

    For 2-plpgsql.sql
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
ERROR:      XX000: could not find block containing chunk 0x11206f028
LOCATION:   AllocSetFree, aset.c:1026
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
  connection to server was lost

Step through 2-plpgsql.sql by hand. The error occurs here:

select insert_outcome('main-1', 'extra-1', 'good')  /*** Fails here on YB with fatal crash ***/;

just as it did with the corresponding script on the 31-optional-with-generated-pk-bad-only-in-plpgsql directory. Notice that the wording of the crash report is the same.

bllewell commented 2 years ago

Here's a much shorter, self-contained, testcase. It produces what looks like the same crash as does the issue-13009.zip testcase. But, different from that, it's the use of "set constraints all immediate" that provokes the crash.

First, create a flavor of the "mandatory one-to-many" setup. Here one among the many details rows must have the special status that it's the parent for the masters row that has these many details:

create table masters(
  id          int   primary key,
  details_id  int   not null,
  v           text  not null);

create table details(
  id          int primary key,
  masters_id  int,
  v           text not null,
  constraint details_fk foreign key(masters_id) references masters(id));

alter table masters add
  constraint masters_fk foreign key(details_id) references details(id)
  initially deferred;

The _mastersfk constraint must be "initially deferred" to allow the bootstrap: you must insert a new masters row and its minimal one details row in a single transaction. This do block inserts two masters rows with two details rows for each. Notice that the PK and FK values are set explicitly. (This makes the demo code as short as possible.)

do $body$
begin
  insert into masters values(10, 1, 'master one');
  insert into details values(1, 10, 'detail one');
  insert into details values(2, 10, 'detail two');

  insert into masters values(20, 3, 'master two');
  insert into details values(3, 20, 'detail three');
  insert into details values(4, 20, 'detail four');
end;
$body$;

Check the outcome thus:

select
  m.v                           as masters,
  array_agg(d.v order by d.id)  as details
from
  masters m
  left outer join
  details d
  on d.masters_id = m.id
group by m.id
order by m.id;

The query uses _arrayagg() to make it easier to see at a glance which details rows, and how many of them, belong to each masters row. Save the query as qry.sql for re-use.

This is the result:

  masters   |            details             
------------+--------------------------------
 master one | {"detail one","detail two"}
 master two | {"detail three","detail four"}

Now delete the "special" details row "detail one". It's special because its both a child of "masters one" and its parent. This means that the remaining child of "masters one", "detail two", must become its new parent.

do $body$
begin
  delete from details
  where v = 'detail one';

  update details set id = (select details_id from masters where v = 'master one')
  where v = 'detail two';
end;
$body$;

The logic that this block uses isn't general. It relies on the human insight that the to-be-deleted "detail one" is indeed the parent of "master one". But it indicates how you could generalize the code to detect when the to-be-deleted details row is the parent of its own masters parent.

It finishes without error—thanks to the "initially deferred" status of _mastersfk. Check the outcome with \ir qry.sql. this is the new result, as intended:

  masters   |            details             
------------+--------------------------------
 master one | {"detail two"}
 master two | {"detail three","detail four"}

Now attempt to violate the "mandatory one-to-many" rule:

do $body$
begin
  delete from details
  where masters_id = (select id from masters where v = 'master one');

  set constraints all immediate;
exception when foreign_key_violation then
  declare
    msg text not null := '';
  begin
    get stacked diagnostics msg = message_text;
    raise info '%', msg;
  end;
end;
$body$;

Notice the use of "set constraints all immediate;". Without this, the error happens at "commit" and is therefore unhandleable. But with this approach, the "set constraints" causes the error. This allows it to be handled. However, using YB-2.15.0.1, this approach causes a fatal crash.

Try it first using vanilla PG. The _foreign_keyviolation error is handled. And "raise info" outputs this:

update or delete on table "details" violates foreign key constraint "masters_fk" on table "masters"

Now try it using YB-2.15.0.1. This is the outcome:

ysqlsh:demo.sql:75: INFO:  update or delete on table "details" violates foreign key constraint "masters_fk" on table "masters"
ysqlsh:demo.sql:75: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
ysqlsh:demo.sql:75: connection to server was lost

This seems to be the same error that was reported using the issue-13009.zip testcase.

Commenting out "set constraints all immediate;" (and re-establishing the context for executing the block) allows it to execute without a crash. But it brings the severe penalty that the _foreign_keyviolation error cannot be handled—in other words, you can't report the outcome to the client as an ordinary return status using a procedure's inout argument.

bllewell commented 2 years ago

@m-iancu, could you please as somebody to look at this—at least to see if it's obvious what the problem is? Notice the x-ref, above, to Issue #3995 which was closed by @d-uspenskiy—claiming that deferred FKs are working! Notice, too, the x-ref to Issue #4700—which also claims that deferred FKs are working!

m-iancu commented 1 year ago

Could not repro this FATAL issue locally. I tried multiple versions from 2.13 to 2.19, also testing on Mac as well. However, the main example did find a real bug which is detailed in https://github.com/yugabyte/yugabyte-db/issues/17657. This causes the results for 31-optional-with-generated-pk-bad-only-in-plpgsql/2-plpgsql.sql and 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql to differ from Postgres.

Will track that issue separately, but keeping this open until we confirm the full example works correctly.

Note, I ran the example using the following commands:

./bin/ysqlsh
\cd ../../issue-13009 
\cd 11-optional-with-explicit-pk-good
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../21-mandatory-with-explicit-pk-good
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../31-optional-with-generated-pk-bad-only-in-plpgsql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
\i 1-pure-sql.sql
\i 2-plpgsql.sql

@bllewell Let me know if that's the right way or I missed something.

foucher commented 1 year ago

I spent some time figuring out the current status on the master branch. I'm unable to reproduce the FATAL issue locally using the repro in the comment, or using the scripts is the original report. I believe the remaining problems are all covered by #17657. Details below.

Regarding running the scripts, I think the proper sequence has a few more setup steps:

./bin/ysqlsh
\cd ../../issue-13009 
\i 10-set-up-optional-with-explicit-pk.sql
\cd 11-optional-with-explicit-pk-good
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../21-mandatory-with-explicit-pk-good
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../31-optional-with-generated-pk-bad-only-in-plpgsql
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql

I'm comparing the output to that of PG 14, the version that happens to be on my computer. I think it's good enough because it all looks like issue #17657.

The file 31-optional-with-generated-pk-bad-only-in-plpgsql/2-plpgsql.sql has 3 extra of the following:

foreign_key_violation on insert

It's also missing a similar message for a delete, but maybe because of the prior errors. This one is probably #17657. According to the original report, this had a fatal crash previously.

There is an error in 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql/1-pure-sql.sql:

ERROR:  insert or update on table "mains" violates foreign key constraint "mains_fk"
DETAIL:  Key (extras_k)=(0) is not present in table "extras".

This is reported by Bryn in the script file, in a comment, verbatim with what I'm seeing. I think this is also #17657.

The file 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql/2-plpgsql.sql also has 3 extra

foreign_key_violation on insert

It's also missing a similar message for a delete, but maybe because of the prior errors. This is also probably #17657. According to the original report, this had a fatal crash previously.

foucher commented 1 year ago

Though this issue is older than #17657, and I usually like to keep the oldest issue for a bug report, this issue is very complicated (though interesting) and covers a now-obsolete fatal error in addition to the remaining issue (checking overwritten deferred constraints). Therefore, let's consider this one a duplicate of the more concise #17657 that covers the remaining issue.