yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.66k stars 1.04k forks source link

[YSQL]Index creation fails with ERROR: Requested catalog version is too high: req version 749, master version 748 #21232

Open shishir2001-yb opened 4 months ago

shishir2001-yb commented 4 months ago

Jira Link: DB-10158

Description

Tried on version 2.21.1.0-b158:

Logs: https://drive.google.com/file/d/16koYdNx3W1CmO4vIkqXhyxw_hITwEpTK/view?usp=sharing

While running the Cross-DB-DDLs sample app in parallel with PITR, some index creation query started failing with below error:

2024-02-27 08:25:07,973 DB NAME:  postgres_18 DDL query CREATE INDEX idx1_tb_1 ON tb_1 (k):
com.yugabyte.util.PSQLException: ERROR: Requested catalog version is too high: req version 749, master version 748
  Where: Catalog Version Mismatch: A DDL occurred while processing this query. Try again.

Fails at step 3.f in the 2nd iteration of Step 3

Test details

1. Start the cross DB DDL workload which will execute DDLs and DMLs across databases concurrently (20 colocated database and 20 non-colocated database), run this for 20-30 mins
2. Create a PITR schedule on 10 random database
3. Start a while loop which executed
  a.  Note down time for PITR(0) 
  b. Create a backup of 1 random database
  c.  Start the cross DB DDL workload and stop it after 10 mins
  d. Note down the time for PITR(1)
  e. Start the cross DB DDL workload and keep it running
  f. Execute PITR on all 10 databases at random times(Between 1-9 sec ago) while the workload is running.
  g. Wait for the workload to stop
  h.  Restore to PITR(1)
  i. Validate data
  j. Restore to PITR(0) with a probability of 0.6 and validate data
  k. Delete the PITR schedule for the backup db (In our case it was postgres_20)
  l. Drop the database 
  m. Restore the backup
  n. Create the snapshot schedule for this new DB
"tserverGFlags": [
               {"name": "ysql_enable_packed_row", "value": "true"},
               {"name": "ysql_enable_packed_row_for_colocated_table", "value": "true"},
               {"name": "enable_automatic_tablet_splitting", "value": "true"}, 
               {"name": "ysql_max_connections", "value": "500"},
               {"name": "client_read_write_timeout_ms", "value": "1800000"},
               {"name": "yb_client_admin_operation_timeout_sec", "value": "1800"}, 
               {"name": "consistent_restore", "value": "true"},
               {"name": "ysql_enable_db_catalog_version_mode", "value": "true"}
],
master_gflags=[
               {"name": "ysql_enable_packed_row", "value": "true"},
               {"name": "ysql_enable_packed_row_for_colocated_table", "value": "true"},
               {"name": "enable_automatic_tablet_splitting", "value": "true"},
               {"name": "tablet_split_high_phase_shard_count_per_node", "value": 20000},
               {"name": "tablet_split_high_phase_size_threshold_bytes", "value": 2097152}, 
               {"name": "tablet_split_low_phase_size_threshold_bytes", "value": 102400},
               {"name": "tablet_split_low_phase_shard_count_per_node", "value": 10000},
               {"name": "consistent_restore", "value": "true"},
               {"name": "ysql_enable_db_catalog_version_mode", "value": "true"}, 
              {"name": "allowed_preview_flags_csv", "value": "ysql_enable_db_catalog_version_mode"}, 
              {"name": "tablet_replicas_per_gib_limit", "value": 0}
]

List of DDLs executed in sample app

private static List<List<String>> ddlList = List.of(
            List.of("CREATE INDEX idx1 ON ? (k)", "DROP INDEX idx1"),
            List.of("CREATE TABLE tempTable1 AS SELECT * FROM ? limit 1000000", "ALTER TABLE tempTable1 RENAME TO tempTable1_new", "DROP TABLE tempTable1_new"),
            List.of("CREATE MATERIALIZED VIEW mv1 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv1", "DROP MATERIALIZED VIEW mv1"),
            List.of("ALTER TABLE ? ADD newColumn1 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? DROP newColumn1"),
            List.of("ALTER TABLE ? ADD newColumn2 TEXT NULL", "ALTER TABLE ? DROP newColumn2"),
            List.of("CREATE VIEW view1_? AS SELECT k from ?", "DROP VIEW view1_?"),
            List.of("ALTER TABLE ? ADD newColumn3 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? ALTER newColumn3 TYPE VARCHAR(1000)", "ALTER TABLE ? DROP newColumn3"),
            List.of("CREATE TABLE tempTable2 AS SELECT * FROM ? limit 1000000", "CREATE INDEX idx2 ON tempTable2(k)", "ALTER TABLE ? ADD newColumn4 TEXT DEFAULT 'dummyString'", "ALTER TABLE tempTable2 ADD newColumn2 TEXT DEFAULT 'dummyString'", "TRUNCATE table ? cascade", "ALTER TABLE ? DROP newColumn4", "ALTER TABLE tempTable2 DROP newColumn2", "DROP INDEX idx2", "DROP TABLE tempTable2"),
            List.of("CREATE VIEW view2_? AS SELECT k from ?", "CREATE MATERIALIZED VIEW mv2 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv2", "DROP MATERIALIZED VIEW mv2", "DROP VIEW view2_?")
 );

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

myang2021 commented 4 months ago

This could be a known issue:

        /*
         * TODO(#5030): there is a bug where master commits a catalog
         * version bump and the following read doesn't pick it up.  This is
         * short-lived, so there only needs to be a few retries.
         */
        const char *msg = YBCStatusMessageBegin(s);
        if (strstr(msg, "Requested catalog version is too high"))
        {
            elog((retries_left > 3 ? DEBUG1 : NOTICE),
                 "Retrying wait for backends catalog version: %s",
                 msg);
            if (retries_left-- > 0)
            {
                YBCFreeStatus(s);
                continue;
            }
        }
        HandleYBStatus(s);