Unusable outcome when two sessions do inserts that jointly will violate a constraint.

bllewell commented 5 years ago

Jira Link: DB-2546 The test uses a table created thus:

\set AUTOCOMMIT on
create table t(
  a integer
    constraint t_a_nn not null
    constraint t_a_unq unique,
  b integer
    constraint t_b_nn not null
    constraint t_b_unq unique);

Now, in Session 1, do this:

\set AUTOCOMMIT on
insert into t(a, b) values(17, 42);

and then in Session 2 do this:

begin;
insert into t(a, b) values(18, 42);

The insert causes the error _duplicate key value violates unique constraint "t_bunq". So issue rollback now. Notice that the error text includes the name of the violated constraint. This is useful because application code can parse the text and generate a useful response for the human user like "The username that you chose is already taken. Please make another choice."

So far, the behavior is identical in PostgreSQL Version 11.2 and YugaByte DB Version 1.3.0.

Now come the two-session tests.

First, PostgreSQL Version 11.2.

In Session 1:

delete from t;
begin;
set transaction isolation level repeatable read;
insert into t(a, b) values(17, 42);

but don't commit yet. Then, in Session 2:

begin;
set transaction isolation level repeatable read;
insert into t(a, b) values(18, 42);

The insert hangs until Session 1 issues commit. When it does, Session 2 gets this error: _duplicate key value violates unique constraint "t_bunq". So we still get a usefully parsable error message, even in the two session case. Alternatively, if Session 1 issues rollback, possibly because of some later error in the same transaction, then Session 2's insert simply goes ahead with no error.

Now, YugaByte DB Version 1.3.0.

In Session 1:

delete from t;
begin;
set transaction isolation level repeatable read;
insert into t(a, b) values(17, 42);

but don't commit yet. Then, in Session 2:

begin;
set transaction isolation level repeatable read;
insert into t(a, b) values(18, 42);

The insert fails immediately with the error " Operation failed. Try again.: Conflicts with higher priority transaction". So now the application code has nothing to parse to determine which constraint has been violated.

Notice that this will make the application programming complicated. There must be two exception handlers, one (set) for the _duplicate key value violates unique constraint "t_bunq" error (and similar for any other constraints that might be violated). And one for the " Operation failed. Try again.: Conflicts with higher priority transaction" error.

It's even worse than this. Using PostgreSQL, you may, or may not, get the one (class of) error in Session 2 according to whether Session 1 eventually issues commit or (is forced to) issue rollback. And when you get the error, there's no retry possibility and so no retry code to write. But in YugaByte DB, you get the error prematurely in Session 2 so that you might succeed on retry (in an infinite loop with a sleep and a timeout). This makes for tortuous application code and poor usability for the end user: either an unhelpful error message, or an unpredictable delay that might resolve quietly or might resolve with the aforementioned unhelpful error message.

Can we make YugaByte DB behave the same as PstgreSQL?

ndeodhar commented 5 years ago

Note that there is subtle difference in the two examples: In the first example, the INSERT is already committed and so we get a clean error like duplicate key value violates unique constraint. In the second example, there is another transaction (lets call it transaction A) that has written the same row and we detect a potential conflict while writing the same row in transaction B. However, it's possible that transaction A gets aborted, in which case if transaction B retries, it will succeed in writing the row. So Try Again error is not incorrect here.

But agree with the general sentiment that the error message can be more useful by providing details like what constraint it violates and hinting that it may succeed if tried again.

bllewell commented 5 years ago

Thanks, Neha. The "subtle" difference was very much part of the design of my experiment. To recap:

— when PostgreSQL detects that a concurrent session has an uncommitted txn that might be incompatible with the present session's DML attempt, the present session patiently waits until the other session issues commit or rollback. Then the present session knows where it stands and can either go ahead without error or raise the error that it would have raised had the other session done what it did yesterday.

— when YugaByte DB detects that a concurrent session has an uncommitted txn that might be incompatible with the present session's DML attempt, the present session immediately raises a "potential conflict, please retry" error.

The PostgreSQL model allows much cleaner, and shorter, application code than does the YugaByte DB model. And it allows more helpful messages to be composed for the human end-user.

So here is the key question:

— Can the YugaByte DB model be changed to be the same as the PostgreSQL model? This would also ease the intellectual task for programmers who are experienced with PostgreSQL and are new to YugaByte DB.

— Or is the presently observed YugaByte DB model non-negotiably determined by the difference between how its persistence "lower half" works compared to how PostgreSQL's "lower half" works?

pkj415 commented 3 years ago

@bllewell this is a golden example!

The issue of a txn not waiting on another one to commit/rollback will be resolved by the pessimistic locking work - #5680
About the error message - agree that we should fix it. Once we have pessimistic locking, we can do some plumbing to differentiate between a constraint violation and other errors.

I am adding this issue to tracking in #5683

pkj415 commented 1 year ago

Circling back here, I looked at this issue again and paged in the context.

We have implemented Wait-on-Conflict concurrency control, which can be turned on using enable_wait_queues=true (we earlier used to incorrectly call this mode pessimistic locking). With this mode, the issue where the second txn doesn't wait for the first one to complete, becomes void.

The second issue, where we throw a 'Operation failed. Try again.: Conflicts with higher priority transaction' instead of _'duplicate key value violates unique constraint "t_bunq"', is still valid. To repro it, use Bryn's second example where the transactions are concurrent (ensure to keep yb_enable_read_committed_isolation=false if #12494 has still not been fixed).

The fix needs to be in Pg: we should intercept the kConflict and direct it to run Pg's code which throws the duplicate key error.

sushantrmishra commented 2 months ago

Tried the steps from description in latest master (2.23):

localhost:5433 yugabyte@yugabyte=#* insert into t(a, b) values(18, 42);

ERROR:  could not serialize access due to concurrent update (query layer retry isn't possible because data was
already sent, if this is the read committed isolation (or) the first statement in repeatable read/ serializable isolation transaction, consider increasing the tserver gflag ysql_output_buffer_size)
DETAIL:  Value write after transaction start: { physical: 1723065734654511 } >= { physical: 1723065628861617 }: kConflict
Time: 105798.207 ms (01:45.798)

yugabyte / yugabyte-db

Unusable outcome when two sessions do inserts that jointly will violate a constraint. #1985