yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.89k stars 1.06k forks source link

FATAL: Could not reconnect to database\nHINT: Database might have been dropped by another user: RPC errors and OBJECT_NOT_FOUND errors in the logs #12501

Closed kripasreenivasan closed 2 years ago

kripasreenivasan commented 2 years ago

Jira Link: [DB-334]

Description

The system itest testysqlddlsintxnblk creates 100s' of tables and further goes on to execute DDLs; in transaction blocks. This test is continuously failing during table creation (after creating some tables) with this error-

ERROR:root:ITEST FAILED testysqlddlsintxnblk-aws-rf3 : OperationalError('FATAL:  Could not reconnect to database\nHINT:  Database might have been dropped by another user\n')
2022-05-13 12:07:32,404 test_base.py:161 ERROR testysqlddlsintxnblk-aws-rf3 ITEST FAILED testysqlddlsintxnblk-aws-rf3 : OperationalError('FATAL:  Could not reconnect to database\nHINT:  Database might have been dropped by another user\n')
Traceback (most recent call last):

Please Note: This is 100% reproducible. We have been able to hit this issue every single time in the last 15 consecutive runs of the test case.

Logs: 12:07:32 is when the test hit the error and these are the errors in the logs at the very same timestamp Info Logs:

I0513 12:07:32.079988 19385 consensus_queue.cc:1190] T d0c4db92ed9747e791d23a65d42bfb3b P 85816df1c6be4403aca14e55936979ea [LEADER]: Got LMP mismatch error from peer: { peer: 1cc20ec2f88b43bd82a9793edfd0c044 is_new: 0 last_received: 1.305 next_index: 306 last_known_committed_idx: 305 is_last_exchange_successful: 0 needs_remote_bootstrap: 0 member_type: VOTER num_sst_files: 0 last_applied: 1.305 }
I0513 12:07:32.151784 19303 heartbeater.cc:338] P 85816df1c6be4403aca14e55936979ea: Connected to a leader master server at 10.9.138.240:7100
I0513 12:07:32.152070 19303 heartbeater.cc:387] P 85816df1c6be4403aca14e55936979ea: Registering TS with master...
I0513 12:07:32.152088 19303 server_base.cc:536] Using private rpc address 10.9.138.240
I0513 12:07:32.152098 19303 server_base.cc:558] Using http address 10.9.138.240
I0513 12:07:32.152104 19303 heartbeater.cc:396] P 85816df1c6be4403aca14e55936979ea: Sending a full tablet report to master...
I0513 12:07:32.153121 19365 tablet_service.cc:1304] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Processing DeleteTablet with delete_type TABLET_DATA_DELETED (Table deleted at 2022-05-13 12:07:32 UTC) from 10.9.138.240:43858
I0513 12:07:32.153139 19365 tablet_peer.cc:431] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [state=RUNNING]: Initiating TabletPeer shutdown
I0513 12:07:32.153147 19365 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: StartShutdown
I0513 12:07:32.153168 19288 poller.cc:66] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Poll stopped: Aborted (yb/rpc/scheduler.cc:65): Task aborted
I0513 12:07:32.153184 19365 transaction_participant.cc:157] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Shutdown
I0513 12:07:32.153192 19365 tablet_peer.cc:448] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [state=QUIESCING]: Started shutdown from state: RUNNING
I0513 12:07:32.153200 19365 maintenance_manager.cc:198] Unregistered op LogGCOp(32c74dd2e3c248638269e38ee05136f6)
I0513 12:07:32.153213 19365 raft_consensus.cc:2749] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [term 1 FOLLOWER]: Shutdown.
I0513 12:07:32.153223 19365 raft_consensus.cc:2780] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [term 1 FOLLOWER]: Raft consensus is shut down!
I0513 12:07:32.153228 19365 tablet_peer.cc:575] Aborting transactions that started prior to <max> for tablet id 32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155375 19365 tablet.cc:949] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: CompleteShutdown
I0513 12:07:32.155391 19365 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: StartShutdown
I0513 12:07:32.155406 19365 db_impl.cc:771] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [R]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155413 19365 db_impl.cc:771] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [I]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6.intents
I0513 12:07:32.155514 19365 db_impl.cc:894] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [I]: Shutdown done
I0513 12:07:32.155578 19365 db_impl.cc:894] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [R]: Shutdown done
I0513 12:07:32.155612 19365 raft_consensus.cc:2749] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea [term 1 FOLLOWER]: Shutdown.
I0513 12:07:32.155642 19365 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: StartShutdown
I0513 12:07:32.155670 19365 ts_tablet_manager.cc:2492] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Deleting tablet data with delete state TABLET_DATA_DELETED
I0513 12:07:32.155678 19365 docdb_rocksdb_util.cc:377] Overriding FLAGS_rocksdb_max_background_flushes to 1
I0513 12:07:32.155683 19365 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155686 19365 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155690 19365 docdb_rocksdb_util.cc:422] FLAGS_rocksdb_base_background_compactions was not set, automatically configuring 1 base background compactions.
I0513 12:07:32.155697 19365 docdb_rocksdb_util.cc:377] Overriding FLAGS_rocksdb_max_background_flushes to 1
I0513 12:07:32.155701 19365 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155704 19365 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155709 19365 docdb_rocksdb_util.cc:422] FLAGS_rocksdb_base_background_compactions was not set, automatically configuring 1 base background compactions.
I0513 12:07:32.155712 19365 tablet_metadata.cc:467] Destroying regular db at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155849 19365 tablet_metadata.cc:473] Successfully destroyed regular DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155977 19365 tablet_metadata.cc:489] Successfully destroyed provisional records DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6.intents
I0513 12:07:32.159801 19365 ts_tablet_manager.cc:2502] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Tablet deleted. Last logged OpId: 1.1
I0513 12:07:32.159819 19365 log.cc:1307] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Deleting WAL dir /mnt/d0/yb-data/tserver/wals/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.159907 19365 consensus_meta.cc:126] T 32c74dd2e3c248638269e38ee05136f6 Deleting consensus metadata
I0513 12:07:32.159955 19365 tablet_bootstrap_if.cc:96] T 32c74dd2e3c248638269e38ee05136f6 P 85816df1c6be4403aca14e55936979ea: Deleted tablet blocks from disk
I0513 12:07:32.159966 19365 ts_tablet_manager.cc:2282] Unregister data/wal directory assignment map for table: 000045df000030008000000000000000.colocated.parent.uuid and tablet 32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.159977 19365 ts_tablet_manager.cc:2577] Deleted transition in progress deleting tablet for tablet 32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.395407 19365 client_master_rpc.cc:75] 0x00000000039d7520 -> GetTableLocationsRpc(table_id: 000045df000030008000000000001f64, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395435 19365 client-internal.cc:1816] GetTableLocationsRpc(table_id: 000045df000030008000000000001f64, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
I0513 12:07:32.395429 19366 client_master_rpc.cc:75] 0x0000000004c9d1f8 -> GetTableSchemaRpc(table_identifier: table_id: "000045df000030008000000000001f64", num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395465 19366 client-internal.cc:1376] GetTableSchemaRpc(table_identifier: table_id: "000045df000030008000000000001f64", num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
I0513 12:07:32.395800 19348 client_master_rpc.cc:75] 0x00000000039d67a0 -> GetTableLocationsRpc(table_id: 000045df0000300080000000000004eb, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)
I0513 12:07:32.395807 19310 client_master_rpc.cc:75] 0x000000000334ad18 -> GetTableSchemaRpc(table_identifier: table_id: "000045df0000300080000000000004eb", num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395817 19348 client-internal.cc:1816] GetTableLocationsRpc(table_id: 000045df0000300080000000000004eb, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395831 19310 client-internal.cc:1376] GetTableSchemaRpc(table_identifier: table_id: "000045df0000300080000000000004eb", num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)

Warning logs

W0513 12:07:32.010026 19236 leader_election.cc:277] T 00000000000000000000000000000000 P 286f945248dc4f6badc78bd7b3a836b9 [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer a40af9d3bf3841c0aebf15f6ab31a1aa: Network error (yb/util/net/socket.cc:540): recvmsg error: Connection refused (system error 111)
W0513 12:07:32.014927 19236 leader_election.cc:277] T 00000000000000000000000000000000 P 286f945248dc4f6badc78bd7b3a836b9 [CANDIDATE]: Term 2 election: RPC error from VoteRequest() call to peer a40af9d3bf3841c0aebf15f6ab31a1aa: Network error (yb/util/net/socket.cc:540): recvmsg error: Connection refused (system error 111)
W0513 12:07:32.021390 21524 operation_tracker.cc:280] T 00000000000000000000000000000000 P 286f945248dc4f6badc78bd7b3a836b9: OperationTracker waiting for 1 outstanding operations to complete now for 0.000s
W0513 12:07:32.021795 19236 consensus_peers.cc:543] T 00000000000000000000000000000000 P 286f945248dc4f6badc78bd7b3a836b9 -> Peer a40af9d3bf3841c0aebf15f6ab31a1aa ([host: "10.9.197.119" port: 7100], []): Couldn't send request.  Status: Network error (yb/util/net/socket.cc:540): recvmsg error: Connection refused (system error 111). Retrying in the next heartbeat period. Already tried 1 times. State: 2
W0513 12:07:32.022202 21523 replica_state.cc:771] T 00000000000000000000000000000000 P 286f945248dc4f6badc78bd7b3a836b9 [term 2 LEADER]: Can't advance the committed index across term boundaries until operations from the current term are replicated. Last committed operation was: 1.5640, New majority replicated is: 1.5640, Current term is: 2
W0513 12:07:32.064842 21528 catalog_manager.cc:7844] Keyspace () has invalid state (0), aborting delete
W0513 12:07:32.065039 21528 catalog_manager.cc:7844] Keyspace () has invalid state (0), aborting delete
W0513 12:07:32.065078 21529 catalog_manager.cc:7844] Keyspace () has invalid state (0), aborting delete
W0513 12:07:32.086357 19230 master.cc:468] ListMasters: Network error (yb/util/net/socket.cc:540): Unable to get registration information for peer ([10.9.197.119:7100]) id (a40af9d3bf3841c0aebf15f6ab31a1aa): recvmsg error: Connection refused (system error 111)
W0513 12:07:32.395435 19365 client-internal.cc:1816] GetTableLocationsRpc(table_id: 000045df000030008000000000001f64, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395465 19366 client-internal.cc:1376] GetTableSchemaRpc(table_identifier: table_id: "000045df000030008000000000001f64", num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_yb_tablegroup' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395817 19348 client-internal.cc:1816] GetTableLocationsRpc(table_id: 000045df0000300080000000000004eb, max_tablets: 2147483647, require_tablets_running: 1, num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)
W0513 12:07:32.395831 19310 client-internal.cc:1376] GetTableSchemaRpc(table_identifier: table_id: "000045df0000300080000000000004eb", num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:632): The object '000045df000030008000000000000000.pg_class' does not exist: OBJECT_NOT_FOUND (master error 3)

Fatal Logs

F20220513 12:07:29 ../../src/yb/tablet/tablet.cc:1308] T 00000000000000000000000000000000 P a40af9d3bf3841c0aebf15f6ab31a1aa: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/tablet/tablet_metadata.cc:1142): Cannot find table info for: 070c0000-0000-0080-0030-000000400000, raft group id: 00000000000000000000000000000000
    @     0x7ff6bc96f34c  google::LogDestination::LogToSinks()
    @     0x7ff6bc96901f  google::LogMessage::SendToLog()
    @     0x7ff6bc969928  google::LogMessage::Flush()
    @     0x7ff6bc96caaf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7ff6bf92f633  yb::tablet::Tablet::WriteToRocksDB()
    @     0x7ff6bf93c9e2  yb::tablet::Tablet::ApplyKeyValueRowOperations()
    @     0x7ff6bf93c744  yb::tablet::Tablet::ApplyOperation()
    @     0x7ff6bf93c3e3  yb::tablet::Tablet::ApplyRowOperations()
    @     0x7ff6bf9c4fb5  yb::tablet::WriteOperation::DoReplicated()
    @     0x7ff6bf9b81c7  yb::tablet::Operation::Replicated()
    @     0x7ff6bf9bb6d2  yb::tablet::OperationDriver::ApplyTask()
    @     0x7ff6bf9bb130  yb::tablet::OperationDriver::ReplicationFinished()
    @     0x7ff6becd1a6d  yb::consensus::ConsensusRound::NotifyReplicationFinished()
    @     0x7ff6bed1542c  yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
    @     0x7ff6bed1487b  yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
    @     0x7ff6bed1433d  yb::consensus::ReplicaState::UpdateMajorityReplicatedUnlocked()
    @     0x7ff6becf46e7  yb::consensus::RaftConsensus::UpdateMajorityReplicated()
    @     0x7ff6becc8a21  yb::consensus::PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask()
    @     0x7ff6bcc023e1  yb::ThreadPool::DispatchThread()
    @     0x7ff6bcbfdaf1  yb::Thread::SuperviseThread()
    @     0x7ff6baeb8694  start_thread
    @     0x7ff6ba5f441d  __clone
I0513 12:07:32.079624 19395 raft_consensus.cc:1704] T d0c4db92ed9747e791d23a65d42bfb3b P 1cc20ec2f88b43bd82a9793edfd0c044 [term 1 FOLLOWER]: Refusing update from remote peer 85816df1c6be4403aca14e55936979ea: Log matching property violated. Preceding OpId in replica: 1.305. Preceding OpId from leader: 1.306. (index mismatch)
I0513 12:07:32.079654 19395 raft_consensus.cc:1934] T d0c4db92ed9747e791d23a65d42bfb3b P 1cc20ec2f88b43bd82a9793edfd0c044 [term 1 FOLLOWER]: Returning from UpdateConsensus because of error: error { code: PRECEDING_ENTRY_DIDNT_MATCH status { code: ILLEGAL_STATE message: "Log matching property violated. Preceding OpId in replica: 1.305. Preceding OpId from leader: 1.306. (index mismatch)" source_file: "../../src/yb/consensus/raft_consensus.cc" source_line: 1702 errors: "\000" } }
W0513 12:07:32.090523 19386 mvcc.cc:432] T d0c4db92ed9747e791d23a65d42bfb3b P 1cc20ec2f88b43bd82a9793edfd0c044: Received propagated safe time { physical: 1652443652039563 } less than the old value: { physical: 1652443652080517 }. This could happen on followers when a new leader is elected.
I0513 12:07:32.149402 19310 heartbeater.cc:338] P 1cc20ec2f88b43bd82a9793edfd0c044: Connected to a leader master server at 10.9.138.240:7100
I0513 12:07:32.149950 19310 heartbeater.cc:387] P 1cc20ec2f88b43bd82a9793edfd0c044: Registering TS with master...
I0513 12:07:32.149962 19310 server_base.cc:536] Using private rpc address 10.9.197.119
I0513 12:07:32.149971 19310 server_base.cc:558] Using http address 10.9.197.119
I0513 12:07:32.149977 19310 heartbeater.cc:396] P 1cc20ec2f88b43bd82a9793edfd0c044: Sending a full tablet report to master...
I0513 12:07:32.152074 19370 tablet_service.cc:1304] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Processing DeleteTablet with delete_type TABLET_DATA_DELETED (Table deleted at 2022-05-13 12:07:32 UTC) from 10.9.138.240:40878
I0513 12:07:32.152099 19370 tablet_peer.cc:431] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [state=RUNNING]: Initiating TabletPeer shutdown
I0513 12:07:32.152107 19370 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: StartShutdown
I0513 12:07:32.152125 19295 poller.cc:66] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Poll stopped: Aborted (yb/rpc/scheduler.cc:65): Task aborted
I0513 12:07:32.152148 19370 transaction_participant.cc:157] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Shutdown
I0513 12:07:32.152153 19370 tablet_peer.cc:448] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [state=QUIESCING]: Started shutdown from state: RUNNING
I0513 12:07:32.152161 19370 maintenance_manager.cc:198] Unregistered op LogGCOp(32c74dd2e3c248638269e38ee05136f6)
I0513 12:07:32.152166 19370 raft_consensus.cc:2749] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [term 1 LEADER]: Shutdown.
I0513 12:07:32.152175 19370 consensus_peers.cc:580] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 -> Peer 85816df1c6be4403aca14e55936979ea ([host: "10.9.138.240" port: 9100], []): Closing peer
I0513 12:07:32.152191 19370 consensus_peers.cc:580] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 -> Peer 133134ca11ef4605acce62ab35b079bb ([host: "10.9.219.5" port: 9100], []): Closing peer
I0513 12:07:32.152225 19370 raft_consensus.cc:2780] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [term 1 LEADER]: Raft consensus is shut down!
I0513 12:07:32.152230 19370 tablet_peer.cc:575] Aborting transactions that started prior to <max> for tablet id 32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.154723 19370 tablet.cc:949] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: CompleteShutdown
I0513 12:07:32.154734 19370 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: StartShutdown
I0513 12:07:32.154750 19370 db_impl.cc:771] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [R]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.154759 19370 db_impl.cc:771] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [I]: Shutting down RocksDB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6.intents
I0513 12:07:32.154868 19370 db_impl.cc:894] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [I]: Shutdown done
I0513 12:07:32.154923 19370 db_impl.cc:894] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [R]: Shutdown done
I0513 12:07:32.154953 19370 raft_consensus.cc:2749] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044 [term 1 LEADER]: Shutdown.
I0513 12:07:32.154994 19370 tablet.cc:934] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: StartShutdown
I0513 12:07:32.155021 19370 ts_tablet_manager.cc:2492] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Deleting tablet data with delete state TABLET_DATA_DELETED
I0513 12:07:32.155030 19370 docdb_rocksdb_util.cc:377] Overriding FLAGS_rocksdb_max_background_flushes to 1
I0513 12:07:32.155035 19370 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155038 19370 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155041 19370 docdb_rocksdb_util.cc:422] FLAGS_rocksdb_base_background_compactions was not set, automatically configuring 1 base background compactions.
I0513 12:07:32.155050 19370 docdb_rocksdb_util.cc:377] Overriding FLAGS_rocksdb_max_background_flushes to 1
I0513 12:07:32.155052 19370 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155056 19370 docdb_rocksdb_util.cc:410] FLAGS_rocksdb_max_background_compactions was not set, automatically configuring 1 background compactions.
I0513 12:07:32.155059 19370 docdb_rocksdb_util.cc:422] FLAGS_rocksdb_base_background_compactions was not set, automatically configuring 1 base background compactions.
I0513 12:07:32.155064 19370 tablet_metadata.cc:467] Destroying regular db at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155201 19370 tablet_metadata.cc:473] Successfully destroyed regular DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.155311 19370 tablet_metadata.cc:489] Successfully destroyed provisional records DB at: /mnt/d0/yb-data/tserver/data/rocksdb/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6.intents
I0513 12:07:32.160071 19370 ts_tablet_manager.cc:2502] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Tablet deleted. Last logged OpId: 1.1
I0513 12:07:32.160089 19370 log.cc:1307] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Deleting WAL dir /mnt/d0/yb-data/tserver/wals/table-000045df000030008000000000000000.colocated.parent.uuid/tablet-32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.160180 19370 consensus_meta.cc:126] T 32c74dd2e3c248638269e38ee05136f6 Deleting consensus metadata
I0513 12:07:32.160212 19370 tablet_bootstrap_if.cc:96] T 32c74dd2e3c248638269e38ee05136f6 P 1cc20ec2f88b43bd82a9793edfd0c044: Deleted tablet blocks from disk
I0513 12:07:32.160223 19370 ts_tablet_manager.cc:2282] Unregister data/wal directory assignment map for table: 000045df000030008000000000000000.colocated.parent.uuid and tablet 32c74dd2e3c248638269e38ee05136f6
I0513 12:07:32.160230 19370 ts_tablet_manager.cc:2577] Deleted transition in progress deleting tablet for tablet 32c74dd2e3c248638269e38ee05136f6

LOGS: testysqlddlsintxnblk-aws.zip

def- commented 2 years ago

Also seeing the same failure in FATAL file in testmultitabletbackup-aws-rf3 on master (2.13.3.0-b37):

F20220513 11:04:06 ../../src/yb/tablet/tablet.cc:1328] T 00000000000000000000000000000000 P 968bb92d24a74c64958b349593d2d2d4: Failed to write a batch with 0 operations into RocksDB: Corruption (yb/tablet/tablet_metadata.cc:1142): Cannot find table info for: 070c0000-0000-0080-0030-000000400000, raft group id: 00000000000000000000000000000000
    @     0x7fc3b090c34c  google::LogDestination::LogToSinks()
    @     0x7fc3b090601f  google::LogMessage::SendToLog()
    @     0x7fc3b0906928  google::LogMessage::Flush()
    @     0x7fc3b0909aaf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc3b38d1833  yb::tablet::Tablet::WriteToRocksDB()
    @     0x7fc3b38df912  yb::tablet::Tablet::ApplyKeyValueRowOperations()
    @     0x7fc3b38df674  yb::tablet::Tablet::ApplyOperation()
    @     0x7fc3b38df2e3  yb::tablet::Tablet::ApplyRowOperations()
    @     0x7fc3b3967e45  yb::tablet::WriteOperation::DoReplicated()
    @     0x7fc3b395b057  yb::tablet::Operation::Replicated()
    @     0x7fc3b395e562  yb::tablet::OperationDriver::ApplyTask()
    @     0x7fc3b395dfc0  yb::tablet::OperationDriver::ReplicationFinished()
    @     0x7fc3b2c71a6d  yb::consensus::ConsensusRound::NotifyReplicationFinished()
    @     0x7fc3b2cb542c  yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
    @     0x7fc3b2cb487b  yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
    @     0x7fc3b2cb433d  yb::consensus::ReplicaState::UpdateMajorityReplicatedUnlocked()
    @     0x7fc3b2c946e7  yb::consensus::RaftConsensus::UpdateMajorityReplicated()
    @     0x7fc3b2c68a21  yb::consensus::PeerMessageQueue::NotifyObserversOfMajorityReplOpChangeTask()
    @     0x7fc3b0b9f721  yb::ThreadPool::DispatchThread()
    @     0x7fc3b0b9ae31  yb::Thread::SuperviseThread()
    @     0x7fc3aee54694  start_thread
    @     0x7fc3ae59141d  __clone

Edit: as well as testysqlddlsintxnblk-aws-rf3

es1024 commented 2 years ago

Fixed in b4dcb3b.