yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[DocDB][LST] Upgrade: FATAL: Check failed: table_names_map_.erase({table_namespace_id, table_name}) == 1 (0 vs. 1) Unable to erase table named upgrade_tab_2_0_9_0_b13 from table names map. #12652

Open def- opened 2 years ago

def- commented 2 years ago

Jira Link: DB-543

Description

Happened during a chain upgrade (2.0 -> 2.2 -> 2.4 -> 2.6 -> 2.8 -> 2.12 -> 2.13.3.0-b41) with LST workload:

F20220524 18:02:52 ../../src/yb/master/catalog_manager.cc:1383] Check failed: table_names_map_.erase({table_namespace_id, table_name}) == 1 (0 vs. 1) Unable to erase table named upgrade_tab_2_0_9_0_b13 from table names map.
    @     0x7fbf2ee01024  yb::LogFatalHandlerSink::send()
    @     0x7fbf2dff82e6  google::LogMessage::SendToLog()
    @     0x7fbf2dff574a  google::LogMessage::Flush()
    @     0x7fbf2dff8819  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fbf37f3b463  yb::master::CatalogManager::AbortTableCreation()
    @     0x7fbf37f55952  yb::master::CatalogManager::CreateTable()
    @     0x7fbf37fd2a54  yb::master::MasterServiceImpl::CreateTable()
    @     0x7fbf3280b2a0  yb::master::MasterServiceIf::Handle()
    @     0x7fbf30892c99  yb::rpc::ServicePoolImpl::Handle()
    @     0x7fbf30836f04  yb::rpc::InboundCall::InboundCallTask::Run()
    @     0x7fbf3089e8e8  yb::rpc::(anonymous namespace)::Worker::Execute()
    @     0x7fbf2ee8e66f  yb::Thread::SuperviseThread()
    @     0x7fbf2a858694  start_thread
    @     0x7fbf29f9541d  __clone
    @              (nil)  (unknown)

I will provide logs on Jira.

bmatican commented 2 years ago

Master hits soft memory limit and then we reject various writes to sys_catalog

W0524 18:02:30.030284 17565 operation_tracker.cc:150] Operation failed, tablet 00000000000000000000000000000000 operation memory consumption (0) has exceeded its limit (1073741824) or the limit of an ancestral tracker
W0524 18:02:30.030356 17565 catalog_manager.cc:4047] Error updating tablets: Service unavailable (yb/tablet/operations/operation_tracker.cc:152): Operation failed, tablet 00000000000000000000000000000000 operation memory consumption (0) has exceeded its limit (1073741824) or the limit of an ancestral tracker. Tablet report was: tablet_id: "0ff0bcb735bb469bbede246e5e93b2cb" state: RUNNING committed_consensus_state { current_term: 2 leader_uuid: "4f770882a6464898b355b5862e1c1185" config { opid_index: -1 peers { permanent_uuid: "9cf81a7337f74a4f832bcece9420191f" member_type: VOTER last_known_private_addr { host: "10.9.200.16" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2c" } } peers { permanent_uuid: "89450a527ce8402e8103a93a0ba0b9f4" member_type: VOTER last_known_private_addr { host: "10.9.138.136" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2b" } } peers { permanent_uuid: "4f770882a6464898b355b5862e1c1185" member_type: VOTER last_known_private_addr { host: "10.9.79.225" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2a" } } } } schema_version: 0 tablet_data_state: TABLET_DATA_READY
def- commented 2 years ago

Haven't seen this again (yet) after increasing master memory to 0.2