yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.98k stars 1.07k forks source link

[DocDb] `ERROR: ybctid DocKey([], [16754]) not found in indexed table. index table id is 00004007000030008000000000000a76 #20777

Open shishir2001-yb opened 9 months ago

shishir2001-yb commented 9 months ago

Jira Link: DB-9775

Description

Tried on version 2.21.0.0-b496

During the execution of the Cross-DB Concurrent DDLs Sample app, sometimes connection creation fails with the below

ERROR: ybctid DocKey([], [16754]) not found in indexed table. index table id is 00004007000030008000000000000a76

UPDATE: This error also occurs in global catalog mode.

List of DDLs :

private static List<List<String>> ddlList = List.of(
            List.of("CREATE INDEX idx1 ON ? (k)", "DROP INDEX idx1"),
            List.of("CREATE TABLE tempTable1 AS SELECT * FROM ? limit 1000000", "ALTER TABLE tempTable1 RENAME TO tempTable1_new", "DROP TABLE tempTable1_new"),
            List.of("CREATE MATERIALIZED VIEW mv1 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv1", "DROP MATERIALIZED VIEW mv1"),
            List.of("ALTER TABLE ? ADD newColumn1 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? DROP newColumn1"),
            List.of("ALTER TABLE ? ADD newColumn2 TEXT NULL", "ALTER TABLE ? DROP newColumn2"),
            List.of("CREATE VIEW view1_? AS SELECT k from ?", "DROP VIEW view1_?"),
            List.of("ALTER TABLE ? ADD newColumn3 TEXT DEFAULT 'dummyString'", "ALTER TABLE ? ALTER newColumn3 TYPE VARCHAR(1000)", "ALTER TABLE ? DROP newColumn3"),
            List.of("CREATE TABLE tempTable2 AS SELECT * FROM ? limit 1000000", "CREATE INDEX idx2 ON tempTable2(k)", "ALTER TABLE ? ADD newColumn4 TEXT DEFAULT 'dummyString'", "ALTER TABLE tempTable2 ADD newColumn2 TEXT DEFAULT 'dummyString'", "TRUNCATE table ? cascade", "ALTER TABLE ? DROP newColumn4", "ALTER TABLE tempTable2 DROP newColumn2", "DROP INDEX idx2", "DROP TABLE tempTable2"),
            List.of("CREATE VIEW view2_? AS SELECT k from ?", "CREATE MATERIALIZED VIEW mv2 as SELECT k from ? limit 10000", "REFRESH MATERIALIZED VIEW mv2", "DROP MATERIALIZED VIEW mv2", "DROP VIEW view2_?")
 );

Context:

  1. We are executing Cross-DB DDLs Sample app with multiple threads. Notably, we ensure there is no simultaneous execution of DDL operations within the same database.
  2. No Global DDLs are being executed.

Sample app details:

  1. Start Cross-DB Concurrent DDLs Sample app which will execute both DDLs and DMLs across databases in parallel. (30 write threads, 15 databases and 10 Read threads)


[[Logs](https://drive.google.com/file/d/1gdZFljpfr90Eo9ItTV3sbAyZk_WDi3xu/view?usp=sharing)](https://drive.google.com/file/d/1gdZFljpfr90Eo9ItTV3sbAyZk_WDi3xu/view?usp=sharing)

### Issue Type

kind/bug

### Warning: Please confirm that this issue does not contain any sensitive information

- [X] I confirm this issue does not contain any sensitive information.

[DB-9775]: https://yugabyte.atlassian.net/browse/DB-9775?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
shishir2001-yb commented 9 months ago

UPDATE: This error also occurs in global catalog mode.

myang2021 commented 9 months ago

This bug can be reproduced in global catalog version mode in 2.20.1.0.

myang2021 commented 9 months ago

This bug can also be reproduced in global catalog version mode in 2.18.5.2. So this is not a recent regression bug.

myang2021 commented 9 months ago

I tried with yugabyte-2.16.8.0, could not reproduce.

myang2021 commented 8 months ago

I did not see persistent corruption and it seems that when reading from the index and base table, the same read time somehow allowed returning the given tuple_id but same read time applied on the base table filtered that row out.

rthallamko3 commented 7 months ago

@shishir2001-yb , Is this issue still relevant on master?

shishir2001-yb commented 7 months ago

@rthallamko3, yes it is consistently reproducible in all the runs.

rthallamko3 commented 5 days ago

Per @myang2021 's comment, this can be reproduced using sample apps from https://github.com/yugabyte/yb-stress-test/releases/download/ssa_1.1.38/yb-stress-sample-apps-1.1.38.jar

./bin/yb-ctl create --timeout-yb-admin-sec 180 --rf 3
/opt/jdk-17/bin/java -jar $HOME/tmp/yb-stress-sample-apps-1.1.38.jar --workload SqlCrossDBLoadWithDDL --num_of_tables_in_db 1 --num_writes -1 --num_reads -1 --num_threads_write 3 --num_threads_read 3 --num_unique_keys 2000000000000000 --num_value_columns 30 --use_datatypes true --nodes 127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --username yugabyte --batch_size 3 --num_of_non_colocated_databases 1 --num_of_colocated_databases 0 --num_of_parallel_ddls 1 --per_db_catalog_mode false >& ~/tmp/global.out

The test output is saved in the file ~/tmp/global.out.