Closed pkj415 closed 1 year ago
Steps to repro:
1. Create a cluster with enable_wait_queues, enable_deadlock_detection and yb_enable_read_committed_isolation enabled
2. Insert 10k rows
3. Start 4 parallel Threads with different connection:
i. BEGIN TRANSACTION;
ii. SELECT * FROM table WHERE ID LIKE '{thread_id+1}%' or ID LIKE '{thread_id+2}%' {X}; X Can be any one of the following-> "FOR UPDATE", "FOR NO KEY UPDATE", "", "FOR KEYSHARE", "FOR SHARE"
iv. UPDATE table SET name="UPDATED_" WHERE ID LIKE '{4-thread_id+1}%' or ID LIKE '{4-thread_id+2}% '
v. COMMIT;
4. Sleep for 30 mins and let the threads run.
I tried some experiments locally, and this behaviour (i.e., UPDATE
retries for a statement being 6 secs to ~1/1.5 min apart and the retry count on query layer going in double digits [aka starvation]) seems expected. I tried a rudimentary ysql_bench that mimics something like this with ~10k-20k rows (row count as per slack conversation with @shishir2001-yb). It seems like SELECT FOR
can take ~30-40 seconds since each row is locked separately and the UPDATE
can take ~6 seconds. So, if a transaction unluckily has to wait for 2-3 other transactions, it might have to wait for more than a minute.
Moreover, while the SELECT
/UPDATE
is being retries by the query layer, other transactions might squeeze in and take locks, hence repeatedly starving a transaction.
Jira Link: DB-7755
Description
As seen in this run - https://jenkins.dev.yugabyte.com/view/Test%20Jobs/job/itest-system-developer/8273/, an UPDATE statement repeatedly faced kConflict errors until the statement times out.
Warning: Please confirm that this issue does not contain any sensitive information