yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.66k stars 1.04k forks source link

[DocDB] Early Abort RC transactions whenever required #21661

Open basavaraj29 opened 3 months ago

basavaraj29 commented 3 months ago

Jira Link: DB-10554

Description

Consider the following scenario where we have a table t(k int primary key, v int), with record (1,1),(2,2)

begin;
update t set v=v+1 where k=1;
insert into t values (2,2);                    // this fails, and the subtxn is rolled back

all commands after this point fail with the following error until an explicit commit/rollback is issued.

current transaction is aborted, commands ignored until end of transaction block

but the transaction itself isn't being aborted. when we know it is eventually going to be aborted, we could early abort it as well. That way other txns waiting to acquire locks on k=1 would be unblocked early. (Can also perform txn related cleanup early).

More detials: We early abort the transactions in a couple of cases

  void Flushed(
      const internal::InFlightOps& ops, const ReadHybridTime& used_read_time,
      const Status& status) EXCLUDES(mutex_) override {
      ...
      if (CanAbortTransaction(status, metadata_, subtransaction_)) {
        // abort 
      }

We look at the passed status alone while making this decision at client/transaction.cc. But it looks like for a couple of errors, we just populate the error in the op's response and pass Status::OK() as a result to Flush. For instance,

I0322 17:09:09.565492 1829154816 transaction.cc:477] vlog5: 0b77f8b8-e80d-4452-b312-55eafa7f4f9d [Session 3]: Flushed: [{ yb_op: 0x00000001216d2f48 -> PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 4626055456 stmt_type: PGSQL_INSERT table_id: "000033c0000030008000000000004000" schema_version: 0 hash_code: 4624 ybctid_column_value { value { binary_value: "G\022\020H\200\000\000\001!!" } } column_values { column_id: 11 expr { value { int32_value: 1 } } } column_refs { } ysql_catalog_version: 1 partition_key: "\022\020", response: status: PGSQL_STATUS_DUPLICATE_KEY_ERROR error_message: "Duplicate key found in primary key or unique index" tablet: 0x000000012095be60 -> { tablet_id: fec74395e7bc43f3bf3b83b332cdd21e partition: { partition_key_start:  partition_key_end: 8000 hash_buckets: [] } partition_list_version: 0 split_depth: 0 } sequence_number: 0 }], used_read_time: { read: <invalid> local_limit: <invalid> global_limit: <invalid> in_txn_limit: <invalid> serial_no: 0 }, status: OK

the op has an error code PGSQL_STATUS_DUPLICATE_KEY_ERROR, but the status is still OK, which leads to the above problem.

cc @pkj415

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

pkj415 commented 3 months ago

After some more thought, this is not GA blocking for RC. Let's remove the epic?