yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.97k stars 1.07k forks source link

[YSQL] Some DDLs fails with RPC timeout issue while perform Cross-DB DDLs #20561

Open shishir2001-yb opened 9 months ago

shishir2001-yb commented 9 months ago

Jira Link: DB-9564

Description

Tried on version: 2.21.0.0-xx(Was a custom build)

While running the Cross-DB Concurrent DDLs Sample app, many DDLs failed with the below error

Error : ERROR: Perform RPC (request call id 140) to 172.151.30.169:9100 timed out after 602.000s

Update: This error also occurs in global catalog mode

List of DDLs :

private static List<List<String>> ddlList = List.of(
            List.of("CREATE INDEX idx1 ON ? (k)", "DROP INDEX idx1"),
            List.of("CREATE TABLE tempTable1 AS SELECT * FROM ? limit 1000000", "ALTER TABLE tempTable1 RENAME TO tempTable1_new", "DROP TABLE tempTable1_new"),
            List.of("CREATE MATERIALIZED VIEW mv1 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv1", "DROP MATERIALIZED VIEW mv1"),
            List.of("ALTER TABLE ? ADD newColumn1 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? DROP newColumn1"),
            List.of("ALTER TABLE ? ADD newColumn2 TEXT NULL", "ALTER TABLE ? DROP newColumn2"),
            List.of("CREATE VIEW view1_? AS SELECT k from ?", "DROP VIEW view1_?"),
            List.of("ALTER TABLE ? ADD newColumn3 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? ALTER newColumn3 TYPE VARCHAR(1000)", "ALTER TABLE ? DROP newColumn3"),
            List.of("CREATE TABLE tempTable2 AS SELECT * FROM ? limit 1000000", "CREATE INDEX idx2 ON tempTable2(k)", "ALTER TABLE ? ADD newColumn4 TEXT DEFAULT 'dummyString'", "ALTER TABLE tempTable2 ADD newColumn2 TEXT DEFAULT 'dummyString'", "TRUNCATE table ? cascade", "ALTER TABLE ? DROP newColumn4", "ALTER TABLE tempTable2 DROP newColumn2", "DROP INDEX idx2", "DROP TABLE tempTable2"),
            List.of("CREATE VIEW view2_? AS SELECT k from ?", "CREATE MATERIALIZED VIEW mv2 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv2", "DROP MATERIALIZED VIEW mv2", "DROP VIEW view2_?")
 );

Context:

  1. We are concurrently executing Cross-DB DDLs Sample app with multiple threads. Notably, we ensure there is no simultaneous execution of DDL and DML operations within the same database.
  2. The table doesn't contains any data.
  3. No Global DDLs are being executed.

Logs:

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

shishir2001-yb commented 9 months ago

I made the following updates:

Excluded ysql_enable_db_catalog_version_mode and allowed_preview_flags_csv=ysql_enable_db_catalog_version_mode g-flags. After updating, I attempted to run the sample app with only 1 thread. However, after more than 24 hours of continuous operation, the following error occurred:

ERROR: Perform RPC (request call id 674) to 172.151.39.244:9100 timed out after 602.000s."

myang2021 commented 9 months ago

@shishir2001-yb 's new finding suggest that this bug isn't unique to per-database catalog version mode. It is an existing bug even in the current (default) catalog version mode. In per-database catalog version mode, there are higher volume of DDLs executed because of concurrent DDL executions so the bug surfaced faster.

tverona1 commented 8 months ago

Updating title to reflect this is not specific to per-db mode.