yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.69k stars 1.04k forks source link

[YSQL] Log information so we can figure out what DDL caused a PG catalog version mismatch #20084

Open mdbridge opened 7 months ago

mdbridge commented 7 months ago

Jira Link: DB-9034

Description

We have had two customers unhappy now about DDLs failing due to transaction conflicts writing to the PG catalog.

In one case the customer claimed (incorrectly) that they had not run any DDL.

In the other case, they were looking for an explanation and didn't realize that DDL's on other tables might cause conflicts and thus a failure.

This task is to add logging so we can after-the-fact explain to customers what DDL caused their DDL to fail.

E.g., we want to be able to answer questions like "why did my index creation fail with

ERROR:  ERROR:  [Query error (yb/tserver/service_util.h:270): The catalog snapshot used for this transaction has been invalidated: MISMATCHED_SCHEMA (tablet server error 5)]
CONTEXT:  Catalog Version Mismatch: A DDL occurred while processing this query. Try again.
CONTEXT:  Catalog Version Mismatch: A DDL occurred while processing this query. Try again.

with something like "It failed because you performed a ALTER TABLE operation while the background creation of that index was running; unfortunately, our code aborts in the sort of situation because it doesn't have enough smarts to figure out whether or not the concurrent DDL altered the index/underlying table schemas in a conflicting way.)

We already have Google flags for logging the current catalog versions we can at least see when it's incremented:

~/code/yugabyte-db/src/yb/common/common_flags.cc:45:
DEFINE_UNKNOWN_bool(log_ysql_catalog_versions, false,
            "Log YSQL catalog events. For debugging purposes.");

Unfortunately, this flag is off by default. Even if it was on, we still don't know what caused the catalog change.

Bonus points if you can somehow return the actual DDL in the Postgres error message.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

mdbridge commented 7 months ago

existing documentation on this: https://support.yugabyte.com/hc/en-us/articles/4406287763597-How-to-troubleshoot-Schema-or-Catalog-version-mismatch-database-errors

afn7081 commented 6 months ago

can i take this up if its not taken up yet?