Open def- opened 2 years ago
Master hits soft memory limit and then we reject various writes to sys_catalog
W0524 18:02:30.030284 17565 operation_tracker.cc:150] Operation failed, tablet 00000000000000000000000000000000 operation memory consumption (0) has exceeded its limit (1073741824) or the limit of an ancestral tracker
W0524 18:02:30.030356 17565 catalog_manager.cc:4047] Error updating tablets: Service unavailable (yb/tablet/operations/operation_tracker.cc:152): Operation failed, tablet 00000000000000000000000000000000 operation memory consumption (0) has exceeded its limit (1073741824) or the limit of an ancestral tracker. Tablet report was: tablet_id: "0ff0bcb735bb469bbede246e5e93b2cb" state: RUNNING committed_consensus_state { current_term: 2 leader_uuid: "4f770882a6464898b355b5862e1c1185" config { opid_index: -1 peers { permanent_uuid: "9cf81a7337f74a4f832bcece9420191f" member_type: VOTER last_known_private_addr { host: "10.9.200.16" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2c" } } peers { permanent_uuid: "89450a527ce8402e8103a93a0ba0b9f4" member_type: VOTER last_known_private_addr { host: "10.9.138.136" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2b" } } peers { permanent_uuid: "4f770882a6464898b355b5862e1c1185" member_type: VOTER last_known_private_addr { host: "10.9.79.225" port: 9100 } cloud_info { placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2a" } } } } schema_version: 0 tablet_data_state: TABLET_DATA_READY
Haven't seen this again (yet) after increasing master memory to 0.2
Jira Link: DB-543
Description
Happened during a chain upgrade (2.0 -> 2.2 -> 2.4 -> 2.6 -> 2.8 -> 2.12 -> 2.13.3.0-b41) with LST workload:
I will provide logs on Jira.