Closed shishir2001-yb closed 5 months ago
cc: @druzac
The collector's call to GetClusterConfig
gets a nullptr when it tries to read CatalogManager::cluster_config_
[0]. According to a comment [1], code that calls GetClusterConfig
is supposed to hold a scoped shared leader lock while doing so, but the collector code doesn't. The catalog reload code explicitly nulls out this value[2], so there's race here. Any test which repeatedly does a point-in-time-restore is more likely to trigger this race. But as far as I can tell it's been in the code for years.
[0]
frame #6: 0x0000aaaae5ad57e8 yb-master`yb::master::CatalogManager::GetClusterConfig(yb::master::SysClusterConfigEntryPB*) [inlined] yb::master::MetadataCowWrapper<yb::master::PersistentClusterConfigInfo>::LockForRead(this=0x0000000000000000) const at catalog_entity_base.h:84:41
frame #7: 0x0000aaaae5ad57e4 yb-master`yb::master::CatalogManager::GetClusterConfig(this=<unavailable>, config=0x0000ffff88976798) at catalog_manager.cc:12480:28
frame #8: 0x0000aaaae5c80900 yb-master`yb::master::BasicCollector::Collect(this=0x000024677f8d4440, collection_level=MEDIUM) at master_call_home.cc:37:48
Jira Link: DB-10650
Description
Version: 2.23.0.0-b91 Logs: https://drive.google.com/file/d/1iRVJhFbKwnfpUP_I3vhLgLSEF5QLD3RS/view?usp=sharing
Full Coredump: https://drive.google.com/file/d/19SkYXdNq4lMJZRUAAsGUwKadX_con4dQ/view?usp=sharing
G-flags:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information