Open ShiKaiWi opened 1 year ago
In recent days, more information has been unveiled when troubleshoot problems caused by this bug. Actually, some ob server's state may be DELETING
which is used to mark when the server node fails and will be replaced, but this state is not thought as active, and if this DELETING
ob server is the leader of some table, the leader not found error
will be thrown.
Check Before Asking
Environment
Fast Reproduce Steps
On the production environment, it is common to see such error without any operations.
Actual Behavior
The error log:
2023-03-09 10:56:46.103 ERRO [/root/.cargo/git/checkouts/obkv-table-client-rs-4fa64c39e1be7389/211e571/src/client/table_client.rs:1124] ObTableClientInner::refresh_all_table_entries fail to refresh table entry for table: wal_wal_20230310000000_000059, err: Common error, code:PartitionError, err:Location::get_table_location_from_remote: partition num=0 has no leader, table=TableEntry { table_id: 1100611140084534, partition_num: 1, refresh_time_mills: 1678330494602, partition_info: None, table_location: TableLocation { replica_locations: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.144.36.65", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Leader }] }, partition_entry: Some(ObPartitionEntry { parititon_location: {0: ObPartitionLocation { leader: Some(ReplicaLocation { addr: ObServerAddr { ip: "33.144.36.65", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Leader }), followers: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }] }} }), row_key_element: {} }, locations={0: ObPartitionLocation { leader: None, followers: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }] }}.
Expected Behavior
Succeed in refreshing table location.
Other Information
No response