yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.84k stars 1.05k forks source link

[YSQL] tserver core dump occurs in Postgres when a transaction is active and the connection to the tserver is unavailable #18192

Closed qvad closed 1 month ago

qvad commented 1 year ago

Jira Link: DB-7215

Description

We have observed these panics when a transaction is active in the following scenarios:

This issue can be reproduced by cancelling a DDL transaction with pg_terminate_backend() before it is completed. In addition, this appears to occur in normal transactions when AbortSubtransaction is called and PostgreSQL process not able to communicate Tserver.

Original test which reproduced this issue:

Scenario is bank workload with large transactions and wait-on-conflict usage
In parallel we restart AWS nodes.

Test failed in 20 minutes with tserver core dump in  2.19.1.0-b168.
2.19.1.0-b1 version fails with postgres core dump in 2-3h.

Warning: Please confirm that this issue does not contain any sensitive information

Karvy-yb commented 1 year ago

The test fails in 20 min with postgres crash on 2.19.1.0-b363 cc: @robertsami

robertsami commented 1 year ago

logs showing:

2023-07-12 05:18:38.033 UTC [1693] WARNING:  AbortTransaction while in ABORT state

from https://github.com/yugabyte/yugabyte-db/blob/735a670a577fc5559f1c435239a8135e04d8a3f7/src/postgres/src/backend/access/transam/xact.c#L2743

robertsami commented 1 year ago

looks like we update this state to TRANS_ABORT in two places: https://github.com/yugabyte/yugabyte-db/blob/735a670a577fc5559f1c435239a8135e04d8a3f7/src/postgres/src/backend/access/transam/xact.c#L2751 https://github.com/yugabyte/yugabyte-db/blob/735a670a577fc5559f1c435239a8135e04d8a3f7/src/postgres/src/backend/access/transam/xact.c#L5104

Karvy-yb commented 1 year ago

I see the same AbortTransaction issue in test_intensive_multi_tenancy_workload for version 2.19.1.0-b363

Karvy-yb commented 1 year ago

Observed SIGABRT issue in test_create_alter_delete_tables_vm_restarts on versions : 2.19.1.0-b379 and 2.19.1.0-b389

cc: @qvad

tvesely commented 1 year ago

This issue occurs when cancelling a DDL transaction with pg_terminate_backend() before it is complete.

{"level":"info","ts":1690844645.7892952,"logger":"ddl_panic","caller":"util/DDLPanic.go:117","msg":"running CREATE","host":"127.0.0.1","port":5433,"user":"postgres","database":"postgres","ssl":false,"backend_id": 1175710}
{"level":"info","ts":1690844645.881921,"logger":"ddl_panic","caller":"util/DDLPanic.go:141","msg":"killing session 1175710","host":"127.0.0.1","port":5433,"user":"postgres","database":"postgres","ssl":false}

PID 1175710 was cancelled with pg_terminate_backend() in the middle of creating a table, and this results in a PANIC.

I0731 16:04:05.835775 1175710 ybccmds.c:527] Creating Table postgres.public.foo
I0731 16:04:06.073963 1175710 pg_txn_manager.cc:384] ExitSeparateDdlTxnMode: { ddl_type: DdlWithDocdbSchemaChanges read_only: 0 deferrable: 0 txn_in_progress: 1 pg_isolation_level: READ_COMMITTED isolation_level: 0 }; query: { create table if not exists foo(a int primary key, b int); }; 
I0731 16:04:06.074873 1175710 pg_txn_manager.cc:239] CalculateIsolation: { ddl_type: DdlWithDocdbSchemaChanges read_only: 0 deferrable: 0 txn_in_progress: 1 pg_isolation_level: READ_COMMITTED isolation_level: 0 }; query: { create table if not exists foo(a int primary key, b int); }; 
2023-07-31 16:04:06.075 PDT [1175710] ERROR:  Shutdown connection
    /home/dreddor/code/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/yql/pggate/util/ybc_util.cc:331:     @     0x7fa4a2eb6d0b  YBCGetStackTrace
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:4781:     @     0x55712667a428  yb_errmsg_from_status_data
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/catalog/yb_catalog/../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_catalog_version.c:464:     @     0x557125fd4657  YbGetMasterCatalogVersionFromTable
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/catalog/yb_catalog/../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_catalog_version.c:58:     @     0x557125fd352b  YbGetMasterCatalogVersion
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:3868:     @     0x557126479006  YBPrepareCacheRefreshIfNeeded
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5360:     @     0x557126477c12  PostgresMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658:     @     0x5571263a1598  BackendRun
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296:     @     0x5571263a0546  BackendStartup
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775:     @     0x55712639f01e  ServerLoop
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431:     @     0x55712639be1a  PostmasterMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234:     @     0x5571262995d7  PostgresServerProcessMain
        @     0x557126299b91 
    ../csu/libc-start.c:308:                                                                                @     0x7fa4a2a62082  __libc_start_main
        @     0x557125e80c6d 

2023-07-31 16:04:06.075 PDT [1175710] STATEMENT:  create table if not exists foo(a int primary key, b int);
I0731 16:04:06.075604 1175710 pg_txn_manager.cc:384] ExitSeparateDdlTxnMode: { ddl_type: DdlWithDocdbSchemaChanges read_only: 0 deferrable: 0 txn_in_progress: 1 pg_isolation_level: READ_COMMITTED isolation_level: 0 }; query: { No query }; 
2023-07-31 16:04:06.076 PDT [1175710] ERROR:  Shutdown connection
    /home/dreddor/code/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/yql/pggate/util/ybc_util.cc:331:     @     0x7fa4a2eb6d0b  YBCGetStackTrace
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:4781:     @     0x55712667a428  yb_errmsg_from_status_data
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:702:     @     0x5571266b85e3  YBCAbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:2852:     @     0x557125f827fa  AbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:3336:     @     0x557125f83e7b  AbortCurrentTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5119:     @     0x557126477649  PostgresMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658:     @     0x5571263a1598  BackendRun
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296:     @     0x5571263a0546  BackendStartup
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775:     @     0x55712639f01e  ServerLoop
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431:     @     0x55712639be1a  PostmasterMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234:     @     0x5571262995d7  PostgresServerProcessMain
        @     0x557126299b91 
    ../csu/libc-start.c:308:                                                                                @     0x7fa4a2a62082  __libc_start_main
        @     0x557125e80c6d 

2023-07-31 16:04:06.076 PDT [1175710] WARNING:  AbortTransaction while in ABORT state
    /home/dreddor/code/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/yql/pggate/util/ybc_util.cc:331:     @     0x7fa4a2eb6d0b  YBCGetStackTrace
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:1748:     @     0x5571266778db  elog_finish
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:2743:     @     0x557125f8260d  AbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:3336:     @     0x557125f83e7b  AbortCurrentTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5119:     @     0x557126477649  PostgresMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658:     @     0x5571263a1598  BackendRun
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296:     @     0x5571263a0546  BackendStartup
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775:     @     0x55712639f01e  ServerLoop
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431:     @     0x55712639be1a  PostmasterMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234:     @     0x5571262995d7  PostgresServerProcessMain
        @     0x557126299b91 
    ../csu/libc-start.c:308:                                                                                @     0x7fa4a2a62082  __libc_start_main
        @     0x557125e80c6d 

I0731 16:04:06.076644 1175710 pg_txn_manager.cc:384] ExitSeparateDdlTxnMode: { ddl_type: DdlWithDocdbSchemaChanges read_only: 0 deferrable: 0 txn_in_progress: 1 pg_isolation_level: READ_COMMITTED isolation_level: 0 }; query: { No query }; 
2023-07-31 16:04:06.077 PDT [1175710] ERROR:  Shutdown connection
    /home/dreddor/code/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/yql/pggate/util/ybc_util.cc:331:     @     0x7fa4a2eb6d0b  YBCGetStackTrace
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:4781:     @     0x55712667a428  yb_errmsg_from_status_data
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:702:     @     0x5571266b85e3  YBCAbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:2852:     @     0x557125f827fa  AbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:3336:     @     0x557125f83e7b  AbortCurrentTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5119:     @     0x557126477649  PostgresMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658:     @     0x5571263a1598  BackendRun
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296:     @     0x5571263a0546  BackendStartup
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775:     @     0x55712639f01e  ServerLoop
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431:     @     0x55712639be1a  PostmasterMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234:     @     0x5571262995d7  PostgresServerProcessMain
        @     0x557126299b91 
    ../csu/libc-start.c:308:                                                                                @     0x7fa4a2a62082  __libc_start_main
        @     0x557125e80c6d 

2023-07-31 16:04:06.077 PDT [1175710] PANIC:  ERRORDATA_STACK_SIZE exceeded
    /home/dreddor/code/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/yql/pggate/util/ybc_util.cc:331:     @     0x7fa4a2eb6d0b  YBCGetStackTrace
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:1147:     @     0x55712667252a  errmsg_internal
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/utils/error/../../../../../../../src/postgres/src/backend/utils/error/elog.c:1698:     @     0x557126677557  elog_start
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:2743:     @     0x557125f825eb  AbortTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/access/transam/../../../../../../../src/postgres/src/backend/access/transam/xact.c:3336:     @     0x557125f83e7b  AbortCurrentTransaction
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5119:     @     0x557126477649  PostgresMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658:     @     0x5571263a1598  BackendRun
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296:     @     0x5571263a0546  BackendStartup
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775:     @     0x55712639f01e  ServerLoop
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431:     @     0x55712639be1a  PostmasterMain
    /home/dreddor/code/yugabyte-db/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234:     @     0x5571262995d7  PostgresServerProcessMain
        @     0x557126299b91 
    ../csu/libc-start.c:308:                                                                                @     0x7fa4a2a62082  __libc_start_main
        @     0x557125e80c6d 

I0731 16:04:06.364205 1175610 postmaster.c:2950] cleaning up after process with pid 1175710 exited with status 134
2023-07-31 16:04:06.364 PDT [1175610] WARNING:  server process (PID 1175710) was terminated by signal 6: Aborted

When the signal handler is called during pg_terminate_backend() the pg_gate client connection to the tserver is terminated. Eventually, during error recovery, ExitSeparateDdlTxnMode() is called, and it tries to cancel the DDL transaction. Because the client connection to the tserver has been disconnected, when it tries to use the client connection to the tserver to cancel the DDL transaction, it fails with ERROR: Shutdown connection.

Because this happens during error recovery, it attempts to recover the session again, and it again attempts to cancel the DDL transaction. This is an infinite loop, and the Postgres backend eventually terminates with PANIC: ERRORDATA_STACK_SIZE exceeded.

sushantrmishra commented 10 months ago

Possible duplicate of this issue : https://github.com/yugabyte/yugabyte-db/issues/17172

tvesely commented 7 months ago

We have observed these panics when a transaction is active in the following scenarios:

In addition to the DDL panic outlined above, this appears to occur in normal transactions when AbortSubtransaction is called.

(lldb) bt
* thread #1, name = 'postgres', stop reason = signal SIGABRT
  * frame #0: 0x00007f6c6cd660a7 libc.so.6`__GI_raise(sig=6) at raise.c:54
    frame #1: 0x00007f6c6cd674aa libc.so.6`__GI_abort at abort.c:89
    frame #2: 0x000055d97e680ebc postgres`errfinish(dummy=<unavailable>) at elog.c:815:3
    frame #3: 0x000055d97e6891d7 postgres`elog_start(filename="", lineno=5161, funcname=<unavailable>) at elog.c:1698:3
    frame #4: 0x000055d97e086a0a postgres`AbortSubTransaction at xact.c:5160:3
    frame #5: 0x000055d97e087c12 postgres`AbortCurrentTransaction at xact.c:3473:4
    frame #6: 0x000055d97e4d14d9 postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, dbname=<unavailable>, username=<unavailable>) at postgres.c:5160:3
    frame #7: 0x000055d97e4132b4 postgres`BackendRun(port=<unavailable>) at postmaster.c:4658:2
    frame #8: 0x000055d97e41245f postgres`ServerLoop [inlined] BackendStartup(port=0x000006f27fda0780) at postmaster.c:4296:3
    frame #9: 0x000055d97e4123c0 postgres`ServerLoop at postmaster.c:1775:7
    frame #10: 0x000055d97e40d762 postgres`PostmasterMain(argc=25, argv=0x000006f27fd041a0) at postmaster.c:1431:11
    frame #11: 0x000055d97e311190 postgres`PostgresServerProcessMain(argc=<unavailable>, argv=<unavailable>) at main.c:234:3
    frame #12: 0x000055d97dfd1852 postgres`main + 34
    frame #13: 0x00007f6c6cd53825 libc.so.6`__libc_start_main(main=(postgres`main), argc=25, argv=0x00007fff48b556f8, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff48b556e8) at libc-start.c:289
    frame #14: 0x000055d97dfd1769 postgres`_start at start.S:108

This flavor the of panic occurs during error recovery when a transaction is active and Postgres attempts to abort a subtransaction. As part of rolling back the subtransaction, it tells the tserver to rollback the subtransaction.

https://github.com/yugabyte/yugabyte-db/blob/0a1ac277ef2024529a09035cac42b403544e4a5d/src/yb/yql/pggate/pg_session.cc#L874-L885

If the network connection is unavailable in any way, when we call pg_client_.RollbackToSubTransaction(id, &options); the PostgresService.RollbackToSubTransaction() RPC will fail. Because this happens during error recovery this will continue to happen until the ERRORDATA_STACK_SIZE has been exceeded.

karthik-ramanathan-3006 commented 5 months ago

Simple repro of the bug: Open up three terminals side by side, with ysqlsh on one, a debugger on the second, and a regular shell on the third.

time ysql debugger shell Explanation
t0 CREATE TABLE test (k INT); We will be using this table to induce a transaction error
t1 SELECT pg_backend_pid(); Note down the backend PID to attach debugger
t2 BEGIN; Start a transaction
t3 SAVEPOINT s1; This triggers the creation of a new subtransaction which is a pre-req for this error
t4 \<attach debugger>
t5 breakpoint set --file=xact.c --name=AbortSubTransaction Set a breakpoint in the transaction recovery code
t6 CREATE TABLE test (k INT); Since the table already exists, this induces an error
t7 kill -9 \<tserver> SIGKILL the tserver without giving it a chance to initiate cleanup
t8 \<detach> Allow execution to proceed
t9 \<error observed> The tserver is killed during error recovery, which kicks off an infinite loop of error recoveries until stack size is exceeded.

Error message:

yugabyte=# CREATE TABLE test (k INT);
WARNING:  01000: AbortSubTransaction while in ABORT state
LOCATION:  AbortSubTransaction, xact.c:5163
WARNING:  01000: AbortSubTransaction while in ABORT state
LOCATION:  AbortSubTransaction, xact.c:5163
WARNING:  01000: AbortSubTransaction while in ABORT state
LOCATION:  AbortSubTransaction, xact.c:5163
ERROR:  42P07: relation "test" already exists
LOCATION:  heap_create_with_catalog, heap.c:1233
ERROR:  XX000: recvmsg error: Connection refused
LOCATION:  YBCRollbackToSubTransaction, ../../src/yb/util/net/socket.cc:540
ERROR:  XX000: recvmsg error: Connection refused
LOCATION:  YBCRollbackToSubTransaction, ../../src/yb/util/net/socket.cc:540
ERROR:  XX000: recvmsg error: Connection refused
LOCATION:  YBCRollbackToSubTransaction, ../../src/yb/util/net/socket.cc:540
ERROR:  XX000: recvmsg error: Connection refused
LOCATION:  YBCRollbackToSubTransaction, ../../src/yb/util/net/socket.cc:540
PANIC:  XX000: ERRORDATA_STACK_SIZE exceeded
LOCATION:  elog_start, elog.c:1704
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
myang2021 commented 4 months ago

I was running a unit test on my centos dev vm and the test had a CHECK failure caused yb-master to go down, eventually should lead the test to fail due to time out after 1800 seconds But I stopped the test via ^C. Incidentally, I was doing a command

$ tail -f yb_alter_table_rewrite.out
ALTER TABLE nopk_part2_part1 DROP CONSTRAINT nopk_part2_part1_pkey;
ALTER TABLE nopk_part2_part2 ADD PRIMARY KEY (id);
ALTER TABLE nopk_part2_part2 DROP CONSTRAINT nopk_part2_part2_pkey;
-- tests for altered table referenced by a partitioned FK table.
CREATE TABLE test (id int unique);
CREATE TABLE test_part (id int REFERENCES test(id)) PARTITION BY RANGE(id);
CREATE TABLE test_part_1 PARTITION OF test_part FOR VALUES FROM (1) TO (100);
INSERT INTO test VALUES (1);
INSERT INTO test_part VALUES (1);
ALTER TABLE test ADD PRIMARY KEY (id);
Cancel request sent
WARNING:  AbortTransaction while in ABORT state
ERROR:  Shutdown connection
ERROR:  Shutdown connection
ERROR:  Shutdown connection
PANIC:  ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost

Seems killing the test caused the similar PANIC: ERRORDATA_STACK_SIZE exceeded error. This time I think since yb-master is already down, killing the java test involves killing the yb-tserver (probably to let it shutdown) .

To see the repro of the above:

I did a clean checkout at

commit 6f6923e39e5c48ed075cb709c91a1af2b2182d6c (HEAD -> master, origin/master, origin/HEAD)
Author: yusong-yan <yusongyan3@gmail.com>
Date:   Fri Mar 29 16:00:52 2024 +0000

    [#20526] XCluster: Refine GetChanges Error Handling for More Accurate CDCErrorPB Codes

Made a release build, then ran: YB_EXTRA_DAEMON_FLAGS="--allowed_preview_flags_csv=ysql_yb_ddl_rollback_enabled --ysql_yb_ddl_rollback_enabled=true --report_ysql_ddl_txn_status_to_master=true --ysql_ddl_transaction_wait_for_ddl_verification=true --log_ysql_catalog_versions=true --ysql_pg_conf_csv=log_statement=all" ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressTable#testPgRegressTable' Saw the master check failure: m1|pid18956|:10802|http://127.176.233.33:28838 F0412 18:59:32.508949 25753 ysql_ddl_verification_task.cc:287] Check failed: l->is_being_created_by_ysql_ddl_txn() complex_pk [id=000033c0000030008000000000004480] contains_alter_table_op: true previous_schema { columns { id: 0 name: "ybrowid" type { main: BINARY } is_key: true is_hash_key: true is_nullable: false is_static: false is_counter: false sorting_type: 0 order: -100 pg_type_oid: 20 marked_for_deletion: false } columns { id: 1 name: "v1" type { main: INT32 } is_key: false is_nullable: true is_static: false is_counter: false sorting_type: 0 order: 1 pg_type_oid: 23 marked_for_deletion: false } columns { id: 2 name: "v2" type { main: STRING } is_key: false is_nullable: true is_static: false is_counter: false sorting_type: 0 order: 2 pg_type_oid: 25 marked_for_deletion: false } columns { id: 3 name: "v3" type { main: STRING } is_key: false is_nullable: true is_static: false is_counter: false sorting_type: 0 order: 3 pg_type_oid: 1042 marked_for_deletion: false } columns { id: 4 name: "v4" type { main: BOOL } is_key: false is_nullable: true is_static: false is_counter: false sorting_type: 0 order: 4 pg_type_oid: 16 marked_for_deletion: false } table_properties { contain_counters: false is_transactional: true consistency_level: STRONG use_mangled_column_name: false is_ysql_catalog_table: false retain_delete_markers: false partitioning_version: 1 ysql_replica_identity: CHANGE } colocated_table_id { } pgschema_name: "public" } previous_table_name: "complex_pk"

karthik-ramanathan-3006 commented 3 months ago

fcbdb09 addresses issues observed around the retry-ability of ABORTing transactions. To completely fix this issue, an additional fix is needed to handle subtransactions that experience an error. Fix is currently in progress.

karthik-ramanathan-3006 commented 1 month ago
Current status: Code change description master 2024.1.1 2024.1 2.20 2.18 and beyond
Fix for failed ABORT Transaction Merged Merged Merged Merged Not planned
Fix for failed ABORT SubTransaction Merged Not available Merged Merged Not planned
Interface for testing failures Merged Not available Merged Merged Not planned