yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.66k stars 1.04k forks source link

[YSQL] [Read Committed]: WaitOn: Existing waiter already found - 352bceaf-4af1-48de-b517-01886e2fd533. This should not happen. #18394

Closed shishir2001-yb closed 3 months ago

shishir2001-yb commented 11 months ago

Jira Link: DB-7386

Description

Tried on version 2.19.1.0-b327: While running 4 parallel transactions(Read committed isolation level) which executes update and select(*) on a bunch of rows, I see the following error in Tserver logs.

E0720 12:21:14.264410 64380 wait_queue.cc:810] T e91c9f581cc0469cbd6e7cc284682528 P fc14e45c359447dba948ccded7123fe2 - WaitOn: Existing waiter already found - 352bceaf-4af1-48de-b517-01886e2fd533. This should not happen.

Universe and Test logs

G-flags used:

"enable_wait_queues": "true",
 "enable_deadlock_detection": "true",
 "enable_automatic_tablet_splitting": "true",
 "tablet_split_high_phase_shard_count_per_node": 200,
 # high_phase_size 2MB
"tablet_split_high_phase_size_threshold_bytes": 2097152,
 # low_phase_size 100KB
"tablet_split_low_phase_size_threshold_bytes": 102400,
"tablet_split_low_phase_shard_count_per_node": 16,
"enable_stream_compression": "true",
"stream_compression_algo": "1"

Warning: Please confirm that this issue does not contain any sensitive information

robertsami commented 11 months ago

from @shishir2001-yb :

Steps to repro:

  1. Create a cluster with enable_wait_queues, enable_deadlock_detection and yb_enable_read_committed_isolation enabled
  2. Insert 10k rows
  3. Start 4 parallel Threads i. BEGIN TRANSACTION; ii. set local retry_backoff_multiplier=1.2;, set local retry_max_backoff='250ms';, set local retry_min_backoff='1ms'; iii. SELECT * FROM table WHERE ID LIKE '{thread_id+1}%' or ID LIKE '{threadid+2}%'; iv. UPDATE table SET name="UPDATED" WHERE ID LIKE '{4-thread_id+1}%' or ID LIKE '{4-thread_id+2}%' v. COMMIT;
  4. Sleep for 5 mins
robertsami commented 10 months ago

attempted repro with following setup and workload

ddl.sql:

create table thread_id (id serial primary key, v int);

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TYPE complex AS (re float8, im float8);
CREATE DOMAIN postal_code AS TEXT CHECK(VALUE ~ '^\d{5}$'OR VALUE ~ '^\d{5}-\d{4}$');
CREATE TYPE e_details AS ENUM ('Email', 'Sms', 'Phone');

CREATE TABLE test_read_committed_table(id text, uuid_col uuid DEFAULT uuid_generate_v4 (), name text, c complex, info json, contact JSONB, arr smallint[], cash money, i inet, m macaddr, i2 serial, i3 bigserial, val smallint, details e_details, age int, collated_data text collate "POSIX", date DATE, n NUMERIC (3, 2), r real, c1 CHAR(1), created_at timestamptz, uuid0 uuid DEFAULT uuid_nil(), uuid1 uuid DEFAULT uuid_generate_v1(), p1 POINT, t1 TIME, ts1 TIMESTAMP, i4 INTERVAL, p2 path, p3 polygon, b box, c2 circle, l line, l1 lseg, a2 text[][], zip postal_code, PRIMARY KEY((id, name) , uuid_col));

INSERT INTO test_read_committed_table(id, name, c, info, contact, arr, cash, i, m, i2, i3, val, details, collated_data, date, n, r, c1,created_at, p1, t1, ts1, i4,p2,p3,b,c2,l,l1,a2,zip) select generate_series(1, 10000)::text, substr(md5(random()::text), 0, 5), (1,2)::complex, '{"customer": "John Doe", "items": {"product": "Laptop","qty": 6}}','{"phones":[ {"type": "mobile", "phone": "001001"} , {"type": "fix", "phone": "002002"}]}', '{1, 2, 3}', '$99.99','1.1.1.1','00:00:00:00:00:00',50055,50055,2869,'Phone',md5(random()::text),current_timestamp,5.36,2147483647,'L',now(),point(3, 4),LOCALTIME(0), '2000-06-22 19:10:25-07', '1 year','(1,3), (4,12)','(1,3), (4,12), (2,4)',' (8,9), (1,3)','10, 4, 10','{1,2,3}','(0, 4),(2,8)','{{"b1", "c"}, {"m1", "l"}}',76321;

workload.sql:

do $$
declare threadid int;
declare res record;
begin
with s as (insert into thread_id (v) values (0) returning id) select id from s into threadid;
commit;
SELECT * FROM test_read_committed_table WHERE ID LIKE '{@threadid+1}%' or ID LIKE '{@threadid+2}%' into res;
UPDATE test_read_committed_table SET name='UPDATED_' WHERE ID LIKE '{4-@threadid+1}%' or ID LIKE '{4-@threadid+2}%';
end $$;

does not seem to repro

rthallamko3 commented 7 months ago

@shishir2001-yb , Can you clarify how frequently the issue repros etc?

shishir2001-yb commented 7 months ago

This issues only occurs with aggressive updates

basavaraj29 commented 5 months ago

copy pasting the conversation from slack -

can we try to trigger a repro with "vmodule=wait_queue=4,conflict_resolution=4" on the latest master. we can't figure out how we hit the fatal, and vmodule would definitely help