yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.95k stars 1.07k forks source link

[YSQL] [Cross-DB-DDL] Index creation fails with Requested catalog version is too high: req version 446, master version 1 #21230

Closed shishir2001-yb closed 7 months ago

shishir2001-yb commented 7 months ago

Jira Link: DB-10157

Description

Tried on version 2.21.1.0-b158:

Logs: https://drive.google.com/file/d/1Nqk5MRaK8wpf9tIifqRjAQzbJ3bO_Kvc/view?usp=sharing

While running the Cross-DB-DDLs sample app in parallel with PITR, some index creation query started failing with below error:

2024-02-28 12:07:04,797 DB NAME: postgres_23 DDL query CREATE INDEX idx2_tb_0 ON tempTable2_tb_0(k):
 ERROR: Requested catalog version is too high: req version 446, master version 1
2024-02-28 12:07:04,889 DB NAME:  postgres_15 DDL query CREATE INDEX idx1_tb_1 ON tb_1 (k): 
ERROR: Requested catalog version is too high: req version 363, master version 1 

Test details

1. Run a workload that changes databases for 20-30 minutes.
2. Schedule point-in-time recovery (PITR) for 10 random databases.
3. Create a backup for one random database.
4. Start and stop the workload after 10 minutes.
5. Note the time for the first PITR.
6. Keep the workload running.
7. Perform another PITR at a random time while the workload continues. ---> Issue occurs here
"tserverGFlags": [
               {"name": "ysql_enable_packed_row", "value": "true"},
               {"name": "ysql_enable_packed_row_for_colocated_table", "value": "true"},
               {"name": "enable_automatic_tablet_splitting", "value": "true"}, 
               {"name": "ysql_max_connections", "value": "500"},
               {"name": "client_read_write_timeout_ms", "value": "1800000"},
               {"name": "yb_client_admin_operation_timeout_sec", "value": "1800"}, 
               {"name": "consistent_restore", "value": "true"},
               {"name": "ysql_enable_db_catalog_version_mode", "value": "true"}
],
master_gflags=[
               {"name": "ysql_enable_packed_row", "value": "true"},
               {"name": "ysql_enable_packed_row_for_colocated_table", "value": "true"},
               {"name": "enable_automatic_tablet_splitting", "value": "true"},
               {"name": "tablet_split_high_phase_shard_count_per_node", "value": 20000},
               {"name": "tablet_split_high_phase_size_threshold_bytes", "value": 2097152}, 
               {"name": "tablet_split_low_phase_size_threshold_bytes", "value": 102400},
               {"name": "tablet_split_low_phase_shard_count_per_node", "value": 10000},
               {"name": "consistent_restore", "value": "true"},
               {"name": "ysql_enable_db_catalog_version_mode", "value": "true"}, 
              {"name": "allowed_preview_flags_csv", "value": "ysql_enable_db_catalog_version_mode"}, 
              {"name": "tablet_replicas_per_gib_limit", "value": 0}
]

List of DDLs executed in sample app

private static List<List<String>> ddlList = List.of(
            List.of("CREATE INDEX idx1 ON ? (k)", "DROP INDEX idx1"),
            List.of("CREATE TABLE tempTable1 AS SELECT * FROM ? limit 1000000", "ALTER TABLE tempTable1 RENAME TO tempTable1_new", "DROP TABLE tempTable1_new"),
            List.of("CREATE MATERIALIZED VIEW mv1 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv1", "DROP MATERIALIZED VIEW mv1"),
            List.of("ALTER TABLE ? ADD newColumn1 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? DROP newColumn1"),
            List.of("ALTER TABLE ? ADD newColumn2 TEXT NULL", "ALTER TABLE ? DROP newColumn2"),
            List.of("CREATE VIEW view1_? AS SELECT k from ?", "DROP VIEW view1_?"),
            List.of("ALTER TABLE ? ADD newColumn3 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? ALTER newColumn3 TYPE VARCHAR(1000)", "ALTER TABLE ? DROP newColumn3"),
            List.of("CREATE TABLE tempTable2 AS SELECT * FROM ? limit 1000000", "CREATE INDEX idx2 ON tempTable2(k)", "ALTER TABLE ? ADD newColumn4 TEXT DEFAULT 'dummyString'", "ALTER TABLE tempTable2 ADD newColumn2 TEXT DEFAULT 'dummyString'", "TRUNCATE table ? cascade", "ALTER TABLE ? DROP newColumn4", "ALTER TABLE tempTable2 DROP newColumn2", "DROP INDEX idx2", "DROP TABLE tempTable2"),
            List.of("CREATE VIEW view2_? AS SELECT k from ?", "CREATE MATERIALIZED VIEW mv2 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv2", "DROP MATERIALIZED VIEW mv2", "DROP VIEW view2_?")
 );

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

shishir2001-yb commented 7 months ago

Possible cause as stated by @myang2021 :

What happened is that yb-master used to only rely on the gflag --ysql_enable_db_catalog_version_mode=true to decide whether it operates in perdb or not. After the above change, it also reads the pg_yb_catalog_version table to see if it has more than 1 rows. But it is doing that as part of heartbeat response message preparation. Let’s say after a new master leader is elected, before it receives the very first heartbeat request, it receives a request to get the current operation mode: its catalog_version_table_in_perdbmode is initialized to false so it will return false. But that’s not right, it should be unknown or return an error, because the value isn’t set yet. Now since it returned false, the caller will think this new master leader still operates in global catalog version mode despite the gflag --ysql_enable_db_catalog_version_mode=true. In global catalog version mode it will read the current version from template1 in pg_yb_catalog_version, that’s incorrect because pg_yb_catalog_version has > 1 rows already.