yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.88k stars 1.05k forks source link

[DocDB] [Read Committed]: Transactions fail with 'Commit of transaction with running requests' #18609

Open shishir2001-yb opened 1 year ago

shishir2001-yb commented 1 year ago

Jira Link: DB-7537

Description

Tried on version: 2.19.1.0-b397 Some of the transaction(Read Committed Isolation level) got aborted with the below error.

2023-08-08 11:31:19,04: Commit of transaction with running requests

Steps to repro:
1. Create a cluster with enable_wait_queues, enable_deadlock_detection and yb_enable_read_committed_isolation enabled
2. Insert 10k rows
3. Start 4 parallel Threads with different connection:
     i.   BEGIN TRANSACTION;
     ii.  set local retry_backoff_multiplier=1.2;, set local retry_max_backoff='250ms';, set local retry_min_backoff='1ms';
     iii. SELECT * FROM table WHERE ID LIKE '{thread_id+1}%' or ID LIKE '{thread_id+2}%' {X}; X Can be any one of the following-> "FOR UPDATE", "FOR NO KEY UPDATE", "", "FOR KEYSHARE", "FOR SHARE"
     iv.  UPDATE table SET name="UPDATED_" WHERE ID LIKE '{4-thread_id+1}%' or ID LIKE '{4-thread_id+2}% '
     v.   COMMIT;
4. Sleep for 30 mins and let the threads run.

Universe Logs

G-flags used:

"yb_enable_read_committed_isolation": "true",
"enable_wait_queues": "true",
 "enable_deadlock_detection": "true",
 "enable_automatic_tablet_splitting": "true",
 "tablet_split_high_phase_shard_count_per_node": 200,
 # high_phase_size 2MB
"tablet_split_high_phase_size_threshold_bytes": 2097152,
 # low_phase_size 100KB
"tablet_split_low_phase_size_threshold_bytes": 102400,
"tablet_split_low_phase_shard_count_per_node": 16,
"enable_stream_compression": "true",
"stream_compression_algo": "1"

Previously reported error logs, which aren't relevant and shouldn't have anything to do with the failure. ignore. Just 30 seconds before this error the following was logged in Tserver error file

E0808 11:30:54.185434 77266 transaction_loader.cc:284] T 71b8240f6ff841e89ca8d91431d855dd P 6e605af616214b8d87ad97936a29e754: Failed to decode intent while loading transaction 7deb5b63-8746-464e-94e2-914521da6dd2, 787DEB5B638746464E94E2914521DA6DD2 => 0A107DEB5B638746464E94E2914521DA6DD210031A20396237306361336438666563343163326131653365646662353939653866326421FFFFFFFFFFFFFFFF290040DAF4207B266031AC8122B2670206003801:
Corruption (yb/dockv/intent.cc:123): Expecting hybrid time with ValueType kHybridTime, found <unknown KeyEntryType : 56>

Warning: Please confirm that this issue does not contain any sensitive information

basavaraj29 commented 1 year ago

the below is no longer relevant I think this corresponds to FOR KEY SHARE alone. I recently brought this up and we were discussing it.

here's a simple repro.

TEST_F(PgTabletSplitTest, TestForKeyShare) {
  auto conn = ASSERT_RESULT(Connect());
  ASSERT_OK(conn.Execute("CREATE TABLE t(k INT, v INT) SPLIT INTO 1 TABLETS;"));
  ASSERT_OK(conn.Execute(
      "INSERT INTO t SELECT i, 1 FROM (SELECT generate_series(1, 10000) i) t2;"));
  ASSERT_OK(cluster_->FlushTablets());

  ASSERT_OK(conn.StartTransaction(IsolationLevel::SNAPSHOT_ISOLATION));
  for (int i = 0; i < 10; ++i) {
    ASSERT_OK(conn.FetchFormat("SELECT * FROM t WHERE k=1 FOR KEY SHARE;", i));
  }

  auto table_id = ASSERT_RESULT(GetTableIDFromTableName("t"));
  ASSERT_OK(SplitSingleTablet(table_id));
  ASSERT_OK(WaitForSplitCompletion(table_id));

  SleepFor(FLAGS_cleanup_split_tablets_interval_sec * 5s * kTimeMultiplier);
  ASSERT_OK(conn.CommitTransaction());
}
basavaraj29 commented 1 year ago

should be okay once https://github.com/yugabyte/yugabyte-db/issues/18615 is resolved.

basavaraj29 commented 1 year ago

upon looking at this again, it has nothing to do with the below error. this can be safely ignored, as the error will be fixed in https://github.com/yugabyte/yugabyte-db/issues/18615

E0808 11:30:54.185434 77266 transaction_loader.cc:284] T 71b8240f6ff841e89ca8d91431d855dd P 6e605af616214b8d87ad97936a29e754: Failed to decode intent while loading transaction 7deb5b63-8746-464e-94e2-914521da6dd2, 787DEB5B638746464E94E2914521DA6DD2 => 0A107DEB5B638746464E94E2914521DA6DD210031A20396237306361336438666563343163326131653365646662353939653866326421FFFFFFFFFFFFFFFF290040DAF4207B266031AC8122B2670206003801:
Corruption (yb/dockv/intent.cc:123): Expecting hybrid time with ValueType kHybridTime, found <unknown KeyEntryType : 56>

would need to debug why the commit errors with message Commit of transaction with running requests

basavaraj29 commented 1 year ago

I see this might be relevant - https://github.com/yugabyte/yugabyte-db/issues/7984

rthallamko3 commented 1 year ago

From the surface, it looks like the issue happens only if split occurs in the middle of the transaction.

basavaraj29 commented 1 year ago

update from Shishir - commit statements don't error with message Commit of transaction with running requests when tablet split is disabled. seems like https://github.com/yugabyte/yugabyte-db/issues/7984 isn't specific to transaction sealing context and could be a broader issue.