yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.06k stars 1.09k forks source link

[YSQL][LST] ERROR: Invalid argument: Invalid column number 22 after add column #14367

Closed def- closed 2 years ago

def- commented 2 years ago

Jira Link: DB-3790

Description

$ cd ~/code/yugabyte-db
$ git checkout ee98138e95a124d3560a70ae69ee73d2275ee720
$ ./yb_build.sh release
$ bin/yb-ctl --replication_factor 3 create --tserver_flags=enable_wait_queue_based_pessimistic_locking=true,enable_deadlock_detection=true,ysql_max_connections=20,ysql_enable_packed_row=true,yb_enable_read_committed_isolation=true,ysql_num_shards_per_tserver=2,enable_stream_compression=true,stream_compression_algo=2,yb_num_shards_per_tserver=2 --master_flags=yb_enable_read_committed_isolation=true,enable_stream_compression=true,stream_compression_algo=2,enable_automatic_tablet_splitting=true,tablet_split_low_phase_shard_count_per_node=1,tablet_split_high_phase_shard_count_per_node=5,ysql_enable_packed_row=true,enable_deadlock_detection=true,enable_wait_queue_based_pessimistic_locking=true
$ cd ~/code/yb-long-system-test
$ git checkout 6e7997a3
$ ./long_system_test.py --nodes=127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --threads=2 --runtime=60 --max-columns=20 --complexity=full --seed=606686
2022-09-20 16:04:10,722 MainThread INFO     Reproduce with: git checkout 6e7997a3 && ./long_system_test.py --nodes=127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --threads=2 --runtime=60 --max-columns=20 --complexity=full --seed=606686
2022-09-20 16:04:11,243 MainThread INFO     Database version: PostgreSQL 11.2-YB-2.15.3.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 13.0.1 (https://github.com/yugabyte/llvm-project.git 0131d8e1a20f72a0f374979985db4c2871be71bc), 64-bit
2022-09-20 16:04:11,246 MainThread INFO     Creating tables for database db_lst_606686
2022-09-20 16:04:33,454 MainThread INFO     Starting worker_0: RandomSelectAction, SetConfigAction
2022-09-20 16:04:33,458 MainThread INFO     Starting worker_1: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-09-20 16:04:43,477 MainThread INFO     Worker queries/s: [013.5][000.5]
2022-09-20 16:04:53,489 MainThread INFO     Worker queries/s: [018.8][000.5]
2022-09-20 16:05:03,497 MainThread INFO     Worker queries/s: [008.9][001.2]
2022-09-20 16:05:13,508 MainThread INFO     Worker queries/s: [008.4][001.1]
2022-09-20 16:05:23,522 MainThread INFO     Worker queries/s: [014.7][000.5]
2022-09-20 16:05:33,532 MainThread INFO     Worker queries/s: [008.2][000.3]
2022-09-20 16:05:43,549 MainThread INFO     Worker queries/s: [024.0][001.2]
2022-09-20 16:05:53,553 MainThread INFO     Worker queries/s: [015.1][002.1]
2022-09-20 16:06:03,557 MainThread INFO     Worker queries/s: [012.4][000.6]
2022-09-20 16:06:13,567 MainThread INFO     Worker queries/s: [005.1][000.6]
2022-09-20 16:06:18,852 worker_1   ERROR    Unexpected query failure: InternalError_
Query: CREATE INDEX CONCURRENTLY idx_t1_c7_bigint_c1_int ON t1 USING btree (c7_bigint HASH NULLS LAST, c1_int ASC NULLS LAST) SPLIT INTO 48 TABLETS;
  values: None
  runtime: 2022-09-20 16:06:09.248 - 2022-09-20 16:06:18.851
  supports explain: False
  supports rollback: False
  affected rows: None
Action: CreateIndexAction
Error class: InternalError_
Error code: XX000
Error message: ERROR:  Aborted: ERROR:  Invalid argument: Invalid column number 22
    @           0x9b9db8  errmsg
    @           0x9e5605  HandleYBStatusAtErrorLevel
    @           0x64abfe  DefineIndex
    @           0x85ed88  ProcessUtilitySlow
    @           0x85d310  standard_ProcessUtility
    @           0x85c8fe  YBProcessUtilityDefaultHook
    @     0x7fe21cad3e20  pgss_ProcessUtility
    @     0x7fe21cacab17  ybpgm_ProcessUtility
    @     0x7fe21caae6ce  pg_hint_plan_ProcessUtility
    @           0x9e9237  YBTxnDdlProcessUtility
    @           0x85c54a  PortalRunUtility
    @           0x85bc37  PortalRunMulti
    @           0x85b4f6  PortalRun
    @           0x858c76  yb_exec_simple_query_impl
    @           0x859216  yb_exec_query_wrapper_one_attempt
    @           0x855d8b  PostgresMain
    @           0x7c1e4c  BackendRun
    @           0x7c12a0  ServerLoop
    @           0x7bd75b  PostmasterMain
    @           0x71c96d  PostgresServerProcessMain
    @           0x71ce12  main
    @     0x7fe2213c3825  __libc_start_main
    @           0x4c8809  _start
Transaction isolation level: read uncommitted
DB Node: host: 127.0.0.2, port: 5433
DB Backend PID: 3675724

Just before that t1 was altered to add a column:

2022-09-20 16:06:08,192 worker_1   ALTER TABLE t1 ADD c21_numrange NUMRANGE;

Ran into this a few weeks ago, didn't see it immediately, just found in logs. LST logs: lst.zip

tedyu commented 2 years ago

What's the schema for t1 table ?

thanks

def- commented 2 years ago
DROP DATABASE IF EXISTS db_lst_606686;
CREATE DATABASE db_lst_606686;
CREATE TYPE int2range AS RANGE (subtype = smallint);
CREATE TYPE e0 AS ENUM ('e0_v0', 'e0_v1', 'e0_v2', 'e0_v3');
CREATE TYPE e1 AS ENUM ('e1_v0', 'e1_v1', 'e1_v2');
CREATE TYPE e2 AS ENUM ('e2_v0', 'e2_v1');
CREATE TYPE e3 AS ENUM ('e3_v0', 'e3_v1', 'e3_v2');
CREATE TABLE t1(c0_float4 FLOAT4,
  c1_int INT DEFAULT -2147483648,
  c2_json JSON DEFAULT '{"a": 8, "b": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"], "c": true}'::json,
  c3_boolean BOOLEAN,
  c4_int4range INT4RANGE NOT NULL DEFAULT '(-63,-16]'::INT4RANGE,
  c5_jsonb JSONB DEFAULT '{"a": 8, "b": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"], "c": true}'::jsonb,
  c6_bigint BIGINT NOT NULL DEFAULT 78,
  c7_bigint BIGINT,
  c8_json JSON NOT NULL DEFAULT '{"a": 9, "b": ["0", "1", "2", "3", "4", "5", "6", "7", "8"], "c": true}'::json,
  c9_int INT,
  c10_smallint SMALLINT NOT NULL,
  c11_date DATE,
  c12_float8 FLOAT8 DEFAULT -24.07243020106324,
  c13_text TEXT,
  c14_text TEXT,
  c15_int2range INT2RANGE NOT NULL DEFAULT '(,79)'::INT2RANGE,
  c16_int8range INT8RANGE DEFAULT '[-68,)'::INT8RANGE,
  c17_bigint BIGINT)
SPLIT INTO 78 TABLETS;

With these ALTERs:

2022-09-20 16:04:40,187 worker_1   ALTER TABLE t1 ADD c18_float4 FLOAT4 DEFAULT 26.77770974545723;
2022-09-20 16:04:50,864 worker_1   ALTER TABLE t1 ADD c19_smallint SMALLINT DEFAULT 92;
2022-09-20 16:05:54,043 worker_1   ALTER TABLE t1 ADD c20_float4 FLOAT4;
2022-09-20 16:06:08,192 worker_1   ALTER TABLE t1 ADD c21_numrange NUMRANGE;
tedyu commented 2 years ago

Wasn't able to reproduce locally:

yugabyte=# CREATE INDEX CONCURRENTLY idx_t1_c7_bigint_c1_int ON t1 USING btree (c7_bigint HASH NULLS LAST, c1_int ASC NULLS LAST) SPLIT INTO 48 TABLETS;
NOTICE:  index method "btree" was replaced with "lsm" in YugabyteDB
NOTICE:  nulls sort ordering option is ignored, NULLS FIRST/NULLS LAST not allowed for a HASH column
CREATE INDEX

Looks like the crash you saw was related to concurrent DDL.

deeps1991 commented 2 years ago

@def- Were you running concurrent DDL connecting to different TServer nodes? If so, could this have been a duplicate of https://github.com/yugabyte/yugabyte-db/issues/14327?

def- commented 2 years ago

Oh, you reported the same error message already, could indeed be a duplicate. LST doesn't give an easy repro so probably best to just check if it occurs again after #14327 has been fixed.