yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.87k stars 1.05k forks source link

[DocDB][LST] ERROR: Query error: GetTransactionStatus: tablet_id: "..." transaction_id: "..." propagated_hybrid_time: 6787469896249577472, retrier: { task_id: -1 state: kRunning deadline: 1024785.037s } passed its deadline 1024785.037s (passed: 5.063s): #13186

Open def- opened 2 years ago

def- commented 2 years ago

Jira Link: DB-2870

Description

$ cd ~/code/yugabyte-db
$ git checkout 226465305d
$ ./yb_build.sh release
$ bin/yb-ctl --replication_factor 3 create --tserver_flags=yb_enable_read_committed_isolation=true,ysql_enable_packed_row=true,ysql_num_shards_per_tserver=1,enable_stream_compression=true,stream_compression_algo=1,yb_num_shards_per_tserver=1 --master_flags=yb_enable_read_committed_isolation=true,ysql_enable_packed_row=true,enable_stream_compression=true,stream_compression_algo=1,enable_automatic_tablet_splitting=true,tablet_split_low_phase_shard_count_per_node=40,tablet_split_high_phase_shard_count_per_node=50
$ cd ~/code/yb-long-system-test
$ git checkout e78e1e69c8ca04d8019d54703f1fe70ab75ca3e5
$ python3.9 ./long_system_test.py --nodes=127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --threads=10 --complexity=full --runtime=60 --max-columns=10 --seed=085883
2022-07-06 08:01:13,001 MainThread INFO     Reproduce with: git checkout e78e1e69 && ./long_system_test.py --nodes=127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --threads=10 --complexity=full --runtime=60 --max-columns=10 --seed=085883
2022-07-06 08:01:13,505 MainThread INFO     Database version: PostgreSQL 11.2-YB-2.15.1.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
2022-07-06 08:01:13,507 MainThread INFO     Creating tables for database db_lst_085883
2022-07-06 08:01:38,274 MainThread INFO     Starting worker_0: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-07-06 08:01:38,275 MainThread INFO     Starting worker_1: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-07-06 08:01:38,276 MainThread INFO     Starting worker_2: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-07-06 08:01:38,278 MainThread INFO     Starting worker_3: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,280 MainThread INFO     Starting worker_4: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,281 MainThread INFO     Starting worker_5: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,282 MainThread INFO     Starting worker_6: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,282 MainThread INFO     Starting worker_7: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,286 MainThread INFO     Starting worker_8: RandomSelectAction, SetConfigAction
2022-07-06 08:01:38,289 MainThread INFO     Starting worker_9: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-07-06 08:01:48,310 MainThread INFO     Worker queries/s: [004.7][005.0][001.4][002.4][001.7][002.5][001.9][002.4][002.4][005.5]
[...]
2022-07-06 08:45:41,110 MainThread INFO     Worker queries/s: [004.6][003.4][000.7][003.5][001.8][002.7][002.1][002.4][002.1][004.5]
2022-07-06 08:45:47,716 worker_6   ERROR    Unexpected query failure: InternalError_
Query: SELECT '1996-12-28', '[-51.743336692506546,81.42226641008705]'::NUMRANGE, -14, '{"a": 2, "b": ["0", "1", "2", "3", "4", "5", "6", "7"], "c": true}'::json, '{"a": 4, "b": ["0", "1"], "c": false}'::json, range_merge(('(,-21]'::INT2RANGE), ((range_merge((range_merge((('(49,54)'::INT2RANGE) * ('[23,)'::INT2RANGE)), ('(-93,-30)'::INT2RANGE))), (('(-61,-60]'::INT2RANGE) * ('[46,72]'::INT2RANGE)))) * (('(-76,36)'::INT2RANGE) * ('(-72,20)'::INT2RANGE)))), -6, ('[-50.31637688983122,-24.063362103611155]'::NUMRANGE) != (range_merge(('(-94.55199528177548,62.20471676777393)'::NUMRANGE), (range_merge(('[-89.81474023169078,30.792852429990205)'::NUMRANGE), ('[-77.63994832543939,54.21354262238603]'::NUMRANGE))))), '(-93,13]'::INT8RANGE, '[-96.11031142769252,-12.36096938303109)'::NUMRANGE, (coalesce(upper((int8range(NULL, (-65)))), (CAST((-3) AS BIGINT)))) + (-85), (-65.52544575002617) || ('5'), '[1962-04-28,1996-12-20)'::DATERANGE, count(tg2_2.c13_daterange), '{"a": 5, "b": ["0", "1", "2"], "c": true}'::jsonb, count(tg2_2.c15_daterange), '(,0)'::INT2RANGE, 45, '1942-04-01' FROM tg2_2 LIMIT 20 OFFSET 4;
  values: None
  runtime: 2022-07-06 08:45:43.669 - 2022-07-06 08:45:47.715
  supports explain: True
  supports rollback: True
  affected rows: None
Action: RandomSelectAction
Error class: InternalError_
Error code: XX000
Error message: ERROR:  Query error: GetTransactionStatus: tablet_id: "4e352528279e45faa14b28b2732b6a5a" transaction_id: "8[\247\210\272EM\256\272sWv\241\256%\375" propagated_hybrid_time: 6787469896249577472, retrier: { task_id: -1 state: kRunning deadline: 1024785.037s } passed its deadline 1024785.037s (passed: 5.063s): Leader does not have a valid lease (yb/consensus/consensus.cc:174): This leader has not yet acquired a lease. (tablet server error 15)
    @           0x997398  errmsg
    @           0x9c2968  HandleYBStatusAtErrorLevel
    @           0x6f0340  ybcIterateForeignScan
    @           0x6e4dc9  ForeignNext
    @           0x6bb99d  ExecScan
    @           0x6c7483  fetch_input_tuple
    @           0x6c61be  ExecAgg
    @           0x6d54d6  ExecLimit
    @           0x6b2aea  standard_ExecutorRun
    @     0x7fd4ce5f7aa5  pgss_ExecutorRun
    @     0x7fd4ce5ee63a  ybpgm_ExecutorRun
    @           0x84252b  PortalRunSelect
    @           0x84206a  PortalRun
    @           0x83f8c6  yb_exec_simple_query_impl
    @           0x83fe66  yb_exec_query_wrapper_one_attempt
    @           0x83cee8  PostgresMain
    @           0x7acffc  BackendRun
    @           0x7ac450  ServerLoop
    @           0x7a8abb  PostmasterMain
    @           0x7127fd  PostgresServerProcessMain
    @           0x712ca2  main
    @     0x7fd4d2aef825  __libc_start_main
    @           0x4b8569  _start
Transaction isolation level: default
DB Node: host: 127.0.0.1, port: 5433
DB Backend PID: 1581245

lst_2022-07-06_08:01:12_085883.zip

Seems to happen every few hours. There are similar bugs open: https://github.com/yugabyte/yugabyte-db/issues/12717 The transaction id looks strange since it's not a number, is that expected?

def- commented 2 years ago

Can also happen with Not the leader:

2022-07-08 05:21:43,144 worker_47  ERROR    Unexpected query failure: InternalError_
Query: SELECT count(tg0_1.c0_daterange), 46, -76.48061723308362, |/ abs(CAST((-22.68601568381105) AS NUMERIC)) FROM tg0_1 WHERE TRUE OFFSET 4;
  values: None
  runtime: 2022-07-08 05:21:38.356 - 2022-07-08 05:21:43.144
  supports explain: True
  supports rollback: True
  affected rows: None
Action: RandomSelectAction
Error class: InternalError_
Error code: XX000
Error message: ERROR:  Query error: GetTransactionStatus: tablet_id: "e5005b6108564d64a7508dbf41c64725" transaction_id: "\2144+\266\3647E)\241\215\377\355\014\377.9" propagated_hybrid_time: 6788127531371376640, retrier: { task_id: -1 state: kRunning deadline: 1185340.487s } passed its deadline 1185340.487s (passed: 5.034s): Illegal state (yb/consensus/consensus.cc:162): Not the leader (tablet server error 15)
Transaction isolation level: repeatable read
DB Node: host: 127.0.0.3, port: 5433
DB Backend PID: 2895037
rthallamko3 commented 1 year ago

@Huqicheng , It looks like another LST run ran into failures due to leaderless tablets on transaction tablet. Error message: ERROR: Query error: GetTransactionStatus: tablet_id: "e5005b6108564d64a7508dbf41c64725" transaction_id: "\2144+\266\3647E)\241\215\377\355\014\377.9" propagated_hybrid_time: 6788127531371376640, retrier: { task_id: -1 state: kRunning deadline: 1185340.487s } passed its deadline 1185340.487s (passed: 5.034s): Illegal state (yb/consensus/consensus.cc:162): Not the leader (tablet server error 15)