Closed rjalan-yb closed 1 year ago
It is not tablet splitting related issue. Tablet splitting is not happening here. Instead, in accordance with logs, the last request was started after the table deletion had been initiated and after the tablet shutting down had been started. I don’t see in logs why the table deletion was initiated. On the other hand, it is confusing why the statement is not failing with something like ERROR: relation "mv6" does not exist since the table deletion has been triggered. Maybe it is somehow related to the materialized view existing for the table or some transactional specific
[MS2] Application fingerprint: version 2.19.1.0 build PRE_RELEASE revision d88aacd6b2b9a92e564ec4447bc05d19d1e4e087 build_type DEBUG built at 12 Jul 2023 05:40:41 UTC
...
[MS2] I0712 19:27:41.755095 1544225 catalog_manager.cc:4190] Successfully created table mv6 [id=000040000000300080000000000040b2] in db_lst_010159 [id=00004000000030008000000000000000] per request from 127.0.0.1:33719
[TS3] I0712 19:27:41.847420 1544045 tablet_service.cc:1476] Processing CreateTablet for T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b (table=mv6 [id=000040000000300080000000000040b2]), partition=hash_split: [0xD555, 0xFFFF]
...
[MS3] I0712 19:36:23.263209 1544210 catalog_manager.cc:4190] Successfully created table pg_temp_16562 [id=000040000000300080000000000040cf] in db_lst_010159 [id=00004000000030008000000000000000] per request from 127.0.0.1:33719
...
[MS2] I0712 19:36:25.055384 1543957 catalog_manager.cc:6640] T 00000000000000000000000000000000 P f7d8c9f18404436e91864014272b0b71: Servicing AlterTable request from 127.0.0.1:35257: table { table_id: “000040000000300080000000000040cf” } new_table_name: “mv6" new_namespace { name: “db_lst_010159” } transaction { transaction_id: “d-\234\003\211\031Fq\246\200\264\217\215\304\003\“” isolation: SNAPSHOT_ISOLATION status_tablet: “633c764773384781aa2494299cac6734" priority: 16233697466001717847 start_hybrid_time: 6918924628176654336 locality: GLOBAL external_transaction: false }
[MS2] I0712 19:36:25.063608 1543957 catalog_manager.cc:10382] Persist transaction metadata into SysTableEntryPB for table ID 000040000000300080000000000040cf
[MS2] I0712 19:36:25.094249 1543957 catalog_manager.cc:6881] Successfully initiated ALTER TABLE (pending tablet schema updates) for mv6 [id=000040000000300080000000000040cf] per request from 127.0.0.1:35257
[MS2] I0712 19:36:25.132654 1548231 backfill_index.cc:264] mv6 [id=000040000000300080000000000040cf] - Alter table completed version=1, state: RUNNING
[MS2] I0712 19:36:25.205421 1544226 catalog_manager.cc:5871] Servicing DeleteTable request from 127.0.0.1:33949: table { table_id: "000040000000300080000000000040b2" } is_index_table: false
[MS2] I0712 19:36:25.229717 1544226 xrepl_catalog_manager.cc:426] Deleting CDC streams for tables: 000040000000300080000000000040b2
[MS2] I0712 19:36:25.229823 1544226 catalog_manager.cc:6067] Deleting table: mv6, retained by: []
[MS2] I0712 19:36:25.230538 1544226 catalog_manager.cc:10559] Deleting tablet 4ea73f019c5743899239e3c1ac1c4f34
...
[MS2] I0712 19:36:25.248878 1544226 catalog_manager.cc:10578] Deleted tablet 4ea73f019c5743899239e3c1ac1c4f34
[MS2] I0712 19:36:25.248904 1544226 catalog_manager.cc:10435] Sending DeleteTablet for 3 replicas of tablet 4ea73f019c5743899239e3c1ac1c4f34
...
[TS3] I0712 19:36:25.254935 1544045 tablet_service.cc:1554] T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b: Processing DeleteTablet with delete_type TABLET_DATA_DELETED (Table deleted at 2023-07-12 19:36:25 UTC) from 127.0.0.2:40363
[TS3] I0712 19:36:25.254969 1544045 tablet_peer.cc:457] T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b [state=RUNNING]: Initiating TabletPeer shutdown
[TS3] I0712 19:36:25.254987 1544045 tablet.cc:1033] T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b: StartShutdown
...
[PG1] 2023-07-12 19:36:25.273 UTC [1545782] STATEMENT: SELECT 23.624215703492823, numrange(NULL, (CAST(((-57.79294575133389) * (92.68987740033023)) AS NUMERIC))) FROM mv6 JOIN v2_0 AS j ON mv6.c0_int4range_0 = j.c2_int4range_0 WHERE TRUE ORDER BY 2 DESC OFFSET 14;
...
[PG1] 2023-07-12 19:36:25.293 UTC [1545782] ERROR: Tablet peer T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b [state=QUIESCING]: is shutting down
...
[MS2] I0712 19:36:25.345577 1548229 async_rpc_tasks.cc:849] DeleteTablet RPC for tablet 4ea73f019c5743899239e3c1ac1c4f34 (mv6 [id=000040000000300080000000000040b2]) on TS=da66b47b641942c7980bfe09ee63367b (task=0x000036c03eb09358, state=kRunning): TS da66b47b641942c7980bfe09ee63367b: tablet 4ea73f019c5743899239e3c1ac1c4f34 (table mv6 [id=000040000000300080000000000040b2]) successfully done
[MS2] I0712 19:36:25.347036 1548229 catalog_manager.cc:7562] Clearing pending delete for tablet 4ea73f019c5743899239e3c1ac1c4f34 in ts da66b47b641942c7980bfe09ee63367b
[MS2] I0712 19:36:25.347927 1548229 catalog_manager.cc:6342] Marking table as DELETED: mv6 [id=000040000000300080000000000040b2]
@rjalan-yb, Please check the steps/script used to execute your test and clarify when the tablet is being deleted and if it is considered to be deleted at all.
This is a variation of https://github.com/yugabyte/yugabyte-db/issues/15788 hit in a different moment of time, hence closing as duplicate.
Duplicate of #15788
Jira Link: DB-7236
Description
Using LST, we are hitting this issue:
Error message: ERROR: Tablet peer T 4ea73f019c5743899239e3c1ac1c4f34 P da66b47b641942c7980bfe09ee63367b [state=QUIESCING]: is shutting down
Cluster creation command:
bin/yb-ctl destroy && bin/yb-ctl --replication_factor 3 create --tserver_flags=enable_wait_queues=true,enable_deadlock_detection=true,ysql_max_connections=20,ysql_enable_packed_row=true,yb_enable_read_committed_isolation=true,ysql_num_shards_per_tserver=2,enable_stream_compression=true,stream_compression_algo=2,yb_num_shards_per_tserver=2 --master_flags=yb_enable_read_committed_isolation=true,enable_stream_compression=true,stream_compression_algo=2,enable_automatic_tablet_splitting=true,tablet_split_low_phase_shard_count_per_node=1,tablet_split_high_phase_shard_count_per_node=5,ysql_enable_packed_row=true,enable_deadlock_detection=true,enable_wait_queues=true
LST run command:
./long_system_test.py --nodes=127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --threads=10 --runtime=60 --max-columns=20 --complexity=full --seed=010159
Error trace:
LST logs/yugabyte-data folder Upload: https://drive.google.com/drive/folders/1QAxc-Nx9nTlgHtLglVReDB2lNjlo2xBr?usp=sharing (Accessible within Yugabyte Organization)
Warning: Please confirm that this issue does not contain any sensitive information