yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.73k stars 1.05k forks source link

[YSQL]Delete query fails with "Could not open relation with OID 16449" #20780

Open shishir2001-yb opened 6 months ago

shishir2001-yb commented 6 months ago

Jira Link: DB-9778

Description

Description

Tried on version 2.21.0.0-b496

During the execution of the Cross-DB Concurrent DDLs Sample app, one of the delete query failed with the below error

Query: DELETE FROM tb_0 WHERE k in ('be8ef4cb-ec72-4679-a488-5af1eb848d39:1408') ERROR: could not open relation with OID 16449

Note: This error only occurred once in ~3 hours of run

List of DDLs :

private static List<List<String>> ddlList = List.of(
            List.of("CREATE INDEX idx1 ON ? (k)", "DROP INDEX idx1"),
            List.of("CREATE TABLE tempTable1 AS SELECT * FROM ? limit 1000000", "ALTER TABLE tempTable1 RENAME TO tempTable1_new", "DROP TABLE tempTable1_new"),
            List.of("CREATE MATERIALIZED VIEW mv1 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv1", "DROP MATERIALIZED VIEW mv1"),
            List.of("ALTER TABLE ? ADD newColumn1 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? DROP newColumn1"),
            List.of("ALTER TABLE ? ADD newColumn2 TEXT NULL", "ALTER TABLE ? DROP newColumn2"),
            List.of("CREATE VIEW view1_? AS SELECT k from ?", "DROP VIEW view1_?"),
            List.of("ALTER TABLE ? ADD newColumn3 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? ALTER newColumn3 TYPE VARCHAR(1000)", "ALTER TABLE ? DROP newColumn3"),
            List.of("CREATE TABLE tempTable2 AS SELECT * FROM ? limit 1000000", "CREATE INDEX idx2 ON tempTable2(k)", "ALTER TABLE ? ADD newColumn4 TEXT DEFAULT 'dummyString'", "ALTER TABLE tempTable2 ADD newColumn2 TEXT DEFAULT 'dummyString'", "TRUNCATE table ? cascade", "ALTER TABLE ? DROP newColumn4", "ALTER TABLE tempTable2 DROP newColumn2", "DROP INDEX idx2", "DROP TABLE tempTable2"),
            List.of("CREATE VIEW view2_? AS SELECT k from ?", "CREATE MATERIALIZED VIEW mv2 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv2", "DROP MATERIALIZED VIEW mv2", "DROP VIEW view2_?")
 );

Context:

  1. We are executing Cross-DB DDLs Sample app with multiple threads. Notably, we ensure there is no simultaneous execution of DDL operations within the same database.
  2. No Global DDLs are being executed.

Sample app details:

  1. Start Cross-DB Concurrent DDLs Sample app which will execute both DDLs and DMLs across databases in parallel. (30 write threads, 15 databases and 10 Read threads)


[[Logs](https://drive.google.com/file/d/1gdZFljpfr90Eo9ItTV3sbAyZk_WDi3xu/view?usp=sharing)](https://drive.google.com/file/d/1gdZFljpfr90Eo9ItTV3sbAyZk_WDi3xu/view?usp=sharing)

### Issue Type

kind/bug

### Warning: Please confirm that this issue does not contain any sensitive information

- [X] I confirm this issue does not contain any sensitive information.

### Issue Type

kind/bug

### Warning: Please confirm that this issue does not contain any sensitive information

- [X] I confirm this issue does not contain any sensitive information.

[DB-9778]: https://yugabyte.atlassian.net/browse/DB-9778?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
myang2021 commented 6 months ago

This issue was hit on c5.4xlarge in per-db mode with just 1 parallel ddl within 24 hours.

myang2021 commented 6 months ago

I can reproduce this bug in global catalog version mode using my locally created cluster (centos7 release build using commit d2a0af45197c22380c65a6fa033ce6334898ef88):

./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 3

Then run the sample app:

/opt/jdk-17/bin/java -jar $HOME/tmp/yb-stress-sample-apps-1.1.38.jar --workload SqlCrossDBLoadWithDDL --num_of_tables_in_db 1 --num_writes -1 --num_reads -1 --num_threads_write 3 --num_threads_read 3 --num_unique_keys 2000000000000000 --num_value_columns 30 --use_datatypes true --nodes 127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --username yugabyte --batch_size 3 --num_of_non_colocated_databases 1 --num_of_colocated_databases 0 --num_of_parallel_ddls 1 --per_db_catalog_mode false >& ~/tmp/global.out

In the test output file ~/tmp/global.out, I saw:

2024-02-22 02:08:22,920 [Thread-3] ERROR ExceptionsTracker - Unexpected Exception occurred in  DB postgres_0 =>  Delete query DELETE FROM tb_0 WHERE k in ('fe6b243e-39da-471b-a318-de1d4a0184b8:5803'): ERROR: could not open relation with OID 20845
myang2021 commented 6 months ago

Updating title to reflect this is not specific to per-db mode.