Open tvesely opened 1 year ago
This can be reproduced by enabling read committed and running the following test:
./yb_build.sh --cxx-test pgwrapper_pg_ddl_concurrency-test --gtest_filter PgDDLConcurrencyTest.IndexCreation
Another similar failure with a slightly different root cause:
[ts-1] F0829 20:14:57.107990 151433 pg_txn_manager.cc:345] Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions
[ts-1] Fatal failure details written to ${BUILD_ROOT}/yb-test-logs/tests-pgwrapper__pg_catalog_version-test/PgCatalogVersionTest_FixCatalogVersionTable.fatal_failure_details.ts-1.2023-08-29T20_14_57.pid151433.txt
[ts-1] F20230829 20:14:57 ../../src/yb/yql/pggate/pg_txn_manager.cc:345] Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions
[ts-1] /net/ip-10-9-7-61.us-west-2.compute.internal/share/jenkins/jenkins-yugabyte-db-phabricator-186855/yugabyte-db/build/debug-clang16-dynamic-ninja/../../src/yb/util/logging.cc:465: @ 0xffff7dd7182b yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long)
[ts-1] @ 0xffff7d42327b
[ts-1] @ 0xffff7d423a23
[ts-1] @ 0xffff7d4262db
[ts-1] ${BUILD_ROOT}/../../src/yb/yql/pggate/pg_txn_manager.cc:344: @ 0xffff858f58cf yb::pggate::PgTxnManager::ResetTransactionReadPoint()
[ts-1] ${BUILD_ROOT}/../../src/yb/yql/pggate/pggate.cc:1879: @ 0xffff857c15db yb::pggate::PgApiImpl::ResetTransactionReadPoint()
[ts-1] ${BUILD_ROOT}/../../src/yb/yql/pggate/ybc_pggate.cc:1335: @ 0xffff857a1d8f YBCPgResetTransactionReadPoint
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/utils/time/../../../../../../../src/postgres/src/backend/utils/time/snapmgr.c:387: @ 0xaaaade4dd663 GetTransactionSnapshot
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/executor/../../../../../../src/postgres/src/backend/executor/functions.c:1175: @ 0xaaaaddf7f8c7 fmgr_sql
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/catalog/yb_catalog/../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_catalog_version.c:117: @ 0xaaaaddd54307 YbCallSQLIncrementCatalogVersions
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/catalog/yb_catalog/../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_catalog_version.c:142: @ 0xaaaaddd515ff YbIncrementMasterDBCatalogVersionTableEntryImpl
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/catalog/yb_catalog/../../../../../../../src/postgres/src/backend/catalog/yb_catalog/yb_catalog_version.c:249: @ 0xaaaaddd51523 YbIncrementMasterCatalogVersionTableEntry
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:1436: @ 0xaaaade49b55f YBDecrementDdlNestingLevel
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:1944: @ 0xaaaade4a3447 YBTxnDdlProcessUtility
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/utility.c:377: @ 0xaaaade2305ab ProcessUtility
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/pquery.c:1206: @ 0xaaaade22fb1f PortalRunUtility
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/pquery.c:1366: @ 0xaaaade22eb2b PortalRunMulti
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/pquery.c:815: @ 0xaaaade22df43 PortalRun
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:1217: @ 0xaaaade22b0b7 exec_simple_query
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:4753: @ 0xaaaade228b17 yb_exec_simple_query_impl
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:4721: @ 0xaaaade228c57 yb_exec_query_wrapper_one_attempt
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:4745: @ 0xaaaade228ad3 yb_exec_query_wrapper
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:4768: @ 0xaaaade223e4b yb_exec_simple_query
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:5393: @ 0xaaaade222c3f PostgresMain
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4658: @ 0xaaaade135d83 BackendRun
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4296: @ 0xaaaade134c23 BackendStartup
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1775: @ 0xaaaade13345f ServerLoop
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/postmaster/../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1431: @ 0xaaaade12fd3f PostmasterMain
[ts-1] ${YB_SRC_ROOT}/src/postgres/src/backend/main/../../../../../../src/postgres/src/backend/main/main.c:234: @ 0xaaaade012597 PostgresServerProcessMain
[ts-1] @ 0xaaaade012a5f
[ts-1] @ 0xffff850a4383
[ts-1]
This one is reproduced by running the following test with READ COMMITTED enabled:
./yb_build.sh --cxx-test pgwrapper_pg_catalog_version-test --gtest_filter PgCatalogVersionTest.FixCatalogVersionTable
In YSQL, READ COMMITTED
semantics only apply to non-ddl statements. However, in this test we end up attempting to update pg_yb_catalog_version
when performing a ALTER ROLE yugabyte SUPERUSER
statement. YbIncrementMasterCatalogVersionTableEntry ends up calling GetTransactionSnapshot and subsequently YBCPgResetTransactionReadPoint because the ddl nesting level is 0. This leads to the crash with the message: "Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions".
In both scenarios we are resetting a transaction read point during a DDL statement. Both cases are using the assumption that we can rely on YBGetDdlNestingLevel() == 0
to mean that the statement being executed is not a DDL. We need to understand why the nesting level is 0 here.
While we were blocking retries for DDLs, we may solve this issue by allowing retries for DDLs in read committed. Needs more investigation to ensure we can do this safely
Would need to at least remove this dcheck from the update above:
[ts-1] F0829 20:14:57.107990 151433 pg_txn_manager.cc:345] Check failed: !IsDdlMode() READ COMMITTED semantics don't apply to DDL transactions
5/7 of the concerning tests look ok to have retries enabled. need to confirm whether truncate is safe to handle this way as well
in the short term, we should make a change to avoid crashing in such a case
long term, we are waiting for changes to improve ddl atomicity before fixing this entirely
cc @pkj415 to link details of the dependent github issue for longer term change
Jira Link: DB-7648
Description
After a
kConflict
error the DDL statement is retried, and as a result the Postgres backend crashes with the following:Warning: Please confirm that this issue does not contain any sensitive information