Closed bllewell closed 1 year ago
Here's a much shorter, self-contained, testcase. It produces what looks like the same crash as does the issue-13009.zip testcase. But, different from that, it's the use of "set constraints all immediate" that provokes the crash.
First, create a flavor of the "mandatory one-to-many" setup. Here one among the many details rows must have the special status that it's the parent for the masters row that has these many details:
create table masters(
id int primary key,
details_id int not null,
v text not null);
create table details(
id int primary key,
masters_id int,
v text not null,
constraint details_fk foreign key(masters_id) references masters(id));
alter table masters add
constraint masters_fk foreign key(details_id) references details(id)
initially deferred;
The _mastersfk constraint must be "initially deferred" to allow the bootstrap: you must insert a new masters row and its minimal one details row in a single transaction. This do block inserts two masters rows with two details rows for each. Notice that the PK and FK values are set explicitly. (This makes the demo code as short as possible.)
do $body$
begin
insert into masters values(10, 1, 'master one');
insert into details values(1, 10, 'detail one');
insert into details values(2, 10, 'detail two');
insert into masters values(20, 3, 'master two');
insert into details values(3, 20, 'detail three');
insert into details values(4, 20, 'detail four');
end;
$body$;
Check the outcome thus:
select
m.v as masters,
array_agg(d.v order by d.id) as details
from
masters m
left outer join
details d
on d.masters_id = m.id
group by m.id
order by m.id;
The query uses _arrayagg() to make it easier to see at a glance which details rows, and how many of them, belong to each masters row. Save the query as qry.sql for re-use.
This is the result:
masters | details
------------+--------------------------------
master one | {"detail one","detail two"}
master two | {"detail three","detail four"}
Now delete the "special" details row "detail one". It's special because its both a child of "masters one" and its parent. This means that the remaining child of "masters one", "detail two", must become its new parent.
do $body$
begin
delete from details
where v = 'detail one';
update details set id = (select details_id from masters where v = 'master one')
where v = 'detail two';
end;
$body$;
The logic that this block uses isn't general. It relies on the human insight that the to-be-deleted "detail one" is indeed the parent of "master one". But it indicates how you could generalize the code to detect when the to-be-deleted details row is the parent of its own masters parent.
It finishes without error—thanks to the "initially deferred" status of _mastersfk. Check the outcome with \ir qry.sql. this is the new result, as intended:
masters | details
------------+--------------------------------
master one | {"detail two"}
master two | {"detail three","detail four"}
Now attempt to violate the "mandatory one-to-many" rule:
do $body$
begin
delete from details
where masters_id = (select id from masters where v = 'master one');
set constraints all immediate;
exception when foreign_key_violation then
declare
msg text not null := '';
begin
get stacked diagnostics msg = message_text;
raise info '%', msg;
end;
end;
$body$;
Notice the use of "set constraints all immediate;". Without this, the error happens at "commit" and is therefore unhandleable. But with this approach, the "set constraints" causes the error. This allows it to be handled. However, using YB-2.15.0.1, this approach causes a fatal crash.
Try it first using vanilla PG. The _foreign_keyviolation error is handled. And "raise info" outputs this:
update or delete on table "details" violates foreign key constraint "masters_fk" on table "masters"
Now try it using YB-2.15.0.1. This is the outcome:
ysqlsh:demo.sql:75: INFO: update or delete on table "details" violates foreign key constraint "masters_fk" on table "masters"
ysqlsh:demo.sql:75: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
ysqlsh:demo.sql:75: connection to server was lost
This seems to be the same error that was reported using the issue-13009.zip testcase.
Commenting out "set constraints all immediate;" (and re-establishing the context for executing the block) allows it to execute without a crash. But it brings the severe penalty that the _foreign_keyviolation error cannot be handled—in other words, you can't report the outcome to the client as an ordinary return status using a procedure's inout argument.
@m-iancu, could you please as somebody to look at this—at least to see if it's obvious what the problem is? Notice the x-ref, above, to Issue #3995 which was closed by @d-uspenskiy—claiming that deferred FKs are working! Notice, too, the x-ref to Issue #4700—which also claims that deferred FKs are working!
Could not repro this FATAL issue locally. I tried multiple versions from 2.13 to 2.19, also testing on Mac as well.
However, the main example did find a real bug which is detailed in https://github.com/yugabyte/yugabyte-db/issues/17657.
This causes the results for 31-optional-with-generated-pk-bad-only-in-plpgsql/2-plpgsql.sql
and 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
to differ from Postgres.
Will track that issue separately, but keeping this open until we confirm the full example works correctly.
Note, I ran the example using the following commands:
./bin/ysqlsh
\cd ../../issue-13009
\cd 11-optional-with-explicit-pk-good
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../21-mandatory-with-explicit-pk-good
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../31-optional-with-generated-pk-bad-only-in-plpgsql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
@bllewell Let me know if that's the right way or I missed something.
I spent some time figuring out the current status on the master branch. I'm unable to reproduce the FATAL issue locally using the repro in the comment, or using the scripts is the original report. I believe the remaining problems are all covered by #17657. Details below.
Regarding running the scripts, I think the proper sequence has a few more setup steps:
./bin/ysqlsh
\cd ../../issue-13009
\i 10-set-up-optional-with-explicit-pk.sql
\cd 11-optional-with-explicit-pk-good
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../21-mandatory-with-explicit-pk-good
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../31-optional-with-generated-pk-bad-only-in-plpgsql
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
\cd ../41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
\i create-plpgsql-code.sql
\i 1-pure-sql.sql
\i 2-plpgsql.sql
I'm comparing the output to that of PG 14, the version that happens to be on my computer. I think it's good enough because it all looks like issue #17657.
The file 31-optional-with-generated-pk-bad-only-in-plpgsql/2-plpgsql.sql
has 3 extra of the following:
foreign_key_violation on insert
It's also missing a similar message for a delete, but maybe because of the prior errors. This one is probably #17657. According to the original report, this had a fatal crash previously.
There is an error in 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql/1-pure-sql.sql
:
ERROR: insert or update on table "mains" violates foreign key constraint "mains_fk"
DETAIL: Key (extras_k)=(0) is not present in table "extras".
This is reported by Bryn in the script file, in a comment, verbatim with what I'm seeing. I think this is also #17657.
The file 41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql/2-plpgsql.sql
also has 3 extra
foreign_key_violation on insert
It's also missing a similar message for a delete, but maybe because of the prior errors. This is also probably #17657. According to the original report, this had a fatal crash previously.
Though this issue is older than #17657, and I usually like to keep the oldest issue for a bug report, this issue is very complicated (though interesting) and covers a now-obsolete fatal error in addition to the remaining issue (checking overwritten deferred constraints). Therefore, let's consider this one a duplicate of the more concise #17657 that covers the remaining issue.
Jira Link: DB-2726
issue-13009.zip
Description
This use case:
is a text-book scenario that calls for a deferrable FK constraint. An implementation either in pure SQL or in PL/pgSQL fails in YB. Each works fine in vanilla PG.
Environment
I spotted the bug using YB-2.13.0.1 on my laptop (running macOS Big Sur Version 11.6.6.) I'm using a freshly-created single node cluster.
Related Github Issues
#4700 [YSQL] Support DEFERRABLE constraints — open
#1709 [YSQL] DEFERRABLE constraint should be supported open
#3995 [YSQL] Support deferrable for foreign key constraints — closed 6-Jul-2020 with reference to this Gira master issue that keeps track of all deferrable constraint issues:
DB-2213 [YSQL] Support DEFERRABLE constraints
The optional (or mandatory) one-to-one relationship
The optional and mandatory flavors meet different use cases.
The canonical implementation has a pair of FK constraints, one in each direction, between the two tables. This ensures, declaratively, that the relationship is indeed one-to-one.
The scheme implies a bootstrap proposition: you can't insert a new mains row unless its extras partner is in place; and you can't insert a new extras row unless its mains partner is in place.
In the mandatory flavor of the use case, the only way to sidestep this dilemma, is to use a single transaction and to take advantage of a deferred FK constraint on the mains table. Even in the optional flavor, proper practice demands using a single transaction to implement the business function « create a new row which has an "extra facts" part ».
Proper practice, again, requires encapsulating a multi-statement business transaction in a PL/pgSQL subprogram with an exception section that will translate low-level exceptions (like _uniqueviolation) into business-purpose oriented vocabulary (like "this nickname is already taken"). However, you cannot handle an exception that arises at commit time in PL/pgSQL. The "set constraints all immediate" SQL statement comes to the rescue here. (Because the bootstrap dilemma arises only on insert, there is no possibility of a race condition—so it's safe to anticipate commit in this way.)
Usual practice also requires that:
These three SQL statements are all that's needed to ensure that the business rules are enforced, for the mandatory one-to-one case using generated surrogate PKs—and it does so declaratively.
(This script shows just one of the four variants that are explained in the Code organization section below.)
What the YSQL doc says
Notice that the CREATE TABLE doc in the stable version that's current at the time of creating this comment (v2.12) shows the full syntax specification for deferrability thus:
and it says this in the Semantics subsection:
This is at odds with what this issue shows. Deferred FK constraints show buggy behavior.
Testcase
It turns out that four distinct degrees of freedom determine the outcome. Each has two choices.
This code tests all sixteen of the combinations. The outcomes are easily summarized thus:
The YB faults occur only with the generated PK implementation. This differs from the implementation where the PKs are set explicitly, in that in the generated case, the inserts into each of the mains and extras tables use the returning clause to discover the generated PK values. The faulty cases are summarized thus:
The error messages are reproduced below.
Code organization
The attached issue-13009.zip contains this file hierarchy:
Each of the combinations of:
with
has its own appropriately named directory. And each has its own correspondingly-named script to create the tables. Use your favorite GUI diff tool to compare them pairwise. They differ only on the lines marked in the optional-generated script shown above at the three lines marked with trailing comments.
The directory for each of the four table creation variants has a set of three files with the same names. Use the GUI diff tool, again, to compare each same-named pair across the directories.
1-pure-sql.sql
This invokes the appropriate set-up script and inserts a few rows within a transaction bounded by start transaction and commit. The method is appropriate to the variant that the directory name expresses.
Here, the
\gset
metacommand is used to store values from the returning clause as psql "substitution directives"—then to be accessed using the colon prefix notation.create-plpgsql-code.sql
This creates a _doinserts() procedure that, for each variant, does exactly what its pure SQL partner does. But the coding is less obscure because ordinary PG/plSQL variables are used instead of psql colon-prefixed substitution directives.
It also creates a _deleteoutcome() function that, as just a sanity test, confirms that decorating the mains FK with "on delete restrict" and decorating the extras FK with "on delete cascade" meets the business requirements.
2-plpgsql.sql
This invokes the appropriate set-up script, invokes create-plpgsql-code.sql, and then invokes _doinserts() a few times and then _deleteoutcome() a few times.
Vanilla PG results
Run the eight scripts (1-pure-sql.sql and 2-plpgsql.sql on each of the four directories using vanilla PG to produce the following reference output.
11-optional-with-explicit-pk-good
21-mandatory-with-explicit-pk-good
31-optional-with-generated-pk-bad-only-in-plpgsql
41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
Yugabyte DB results
Now run the eight scripts (1-pure-sql.sql and 2-plpgsql.sql on each of the four directories using YugabyteDB. As mentioned, five of the scripts produce identical results in YB as in PG—so they aren't reproduced here. Only the results of the three faulty scripts are shown.
31-optional-with-generated-pk-bad-only-in-plpgsql
Step through 2-plpgsql.sql by hand. The error occurs here:
This is the first invocation of _insertoutcome() that uses two select statements with the returning clause and then update to the mains table using the returned values.
41-mandatory-with-generated-pk-bad-in-both-sql-and-plpgsql
So far so good. But when you step through 1-pure-sql.sql by hand, you'll see that the inner join query to show the results is invoked before the commit that ends the transaction. When you issue commit, you'll see this error:
This is clearly at odds with what the inner join query showed just before you issued commit.
Step through 2-plpgsql.sql by hand. The error occurs here:
just as it did with the corresponding script on the 31-optional-with-generated-pk-bad-only-in-plpgsql directory. Notice that the wording of the crash report is the same.