We have had two customers unhappy now about DDLs failing due to transaction conflicts writing to the PG catalog.
In one case the customer claimed (incorrectly) that they had not run any DDL.
In the other case, they were looking for an explanation and didn't realize that DDL's on other tables might cause conflicts and thus a failure.
This task is to add logging so we can after-the-fact explain to customers what DDL caused their DDL to fail.
E.g., we want to be able to answer questions like "why did my index creation fail with
ERROR: ERROR: [Query error (yb/tserver/service_util.h:270): The catalog snapshot used for this transaction has been invalidated: MISMATCHED_SCHEMA (tablet server error 5)]
CONTEXT: Catalog Version Mismatch: A DDL occurred while processing this query. Try again.
CONTEXT: Catalog Version Mismatch: A DDL occurred while processing this query. Try again.
with something like "It failed because you performed a ALTER TABLE operation while the background creation of that index was running; unfortunately, our code aborts in the sort of situation because it doesn't have enough smarts to figure out whether or not the concurrent DDL altered the index/underlying table schemas in a conflicting way.)
We already have Google flags for logging the current catalog versions we can at least see when it's incremented:
~/code/yugabyte-db/src/yb/common/common_flags.cc:45:
DEFINE_UNKNOWN_bool(log_ysql_catalog_versions, false,
"Log YSQL catalog events. For debugging purposes.");
Unfortunately, this flag is off by default. Even if it was on, we still don't know what caused the catalog change.
Bonus points if you can somehow return the actual DDL in the Postgres error message.
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
[X] I confirm this issue does not contain any sensitive information.
Jira Link: DB-9034
Description
We have had two customers unhappy now about DDLs failing due to transaction conflicts writing to the PG catalog.
In one case the customer claimed (incorrectly) that they had not run any DDL.
In the other case, they were looking for an explanation and didn't realize that DDL's on other tables might cause conflicts and thus a failure.
This task is to add logging so we can after-the-fact explain to customers what DDL caused their DDL to fail.
E.g., we want to be able to answer questions like "why did my index creation fail with
with something like "It failed because you performed a ALTER TABLE operation while the background creation of that index was running; unfortunately, our code aborts in the sort of situation because it doesn't have enough smarts to figure out whether or not the concurrent DDL altered the index/underlying table schemas in a conflicting way.)
We already have Google flags for logging the current catalog versions we can at least see when it's incremented:
Unfortunately, this flag is off by default. Even if it was on, we still don't know what caused the catalog change.
Bonus points if you can somehow return the actual DDL in the Postgres error message.
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information