oceanbase / obkv-table-client-rs

OBKV Table Client is Rust Library that can be used to access table data from OceanBase storage layer. Its access method is different from JDBC, it skips the SQL parsing layer, so it has significant performance advantage.
Other
20 stars 20 forks source link

[Bug]: failure to refresh table location #20

Open ShiKaiWi opened 1 year ago

ShiKaiWi commented 1 year ago

Check Before Asking

Environment

* client version:
name = "obkv-table-client-rs"
version = "0.1.0"
source = "git+https://github.com/oceanbase/obkv-table-client-rs.git?rev=211e5718630577a7f8c1a2d74055bad4d31dea57#211e5718630577a7f8c1a2d74055bad4d31dea57"

* server version: 1477

Fast Reproduce Steps

On the production environment, it is common to see such error without any operations.

Actual Behavior

The error log:

2023-03-09 10:56:46.103 ERRO [/root/.cargo/git/checkouts/obkv-table-client-rs-4fa64c39e1be7389/211e571/src/client/table_client.rs:1124] ObTableClientInner::refresh_all_table_entries fail to refresh table entry for table: wal_wal_20230310000000_000059, err: Common error, code:PartitionError, err:Location::get_table_location_from_remote: partition num=0 has no leader, table=TableEntry { table_id: 1100611140084534, partition_num: 1, refresh_time_mills: 1678330494602, partition_info: None, table_location: TableLocation { replica_locations: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.144.36.65", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Leader }] }, partition_entry: Some(ObPartitionEntry { parititon_location: {0: ObPartitionLocation { leader: Some(ReplicaLocation { addr: ObServerAddr { ip: "33.144.36.65", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Leader }), followers: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }] }} }), row_key_element: {} }, locations={0: ObPartitionLocation { leader: None, followers: [ReplicaLocation { addr: ObServerAddr { ip: "11.34.28.233", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }, ReplicaLocation { addr: ObServerAddr { ip: "33.141.30.4", sql_port: 2881, svr_port: 2882, priority: 0, grant_priority_times: 0 }, info: ObServerInfo { stop_time: 0, status: Active }, role: Follower }] }}.

Expected Behavior

Succeed in refreshing table location.

Other Information

No response

ShiKaiWi commented 1 year ago

In recent days, more information has been unveiled when troubleshoot problems caused by this bug. Actually, some ob server's state may be DELETING which is used to mark when the server node fails and will be replaced, but this state is not thought as active, and if this DELETING ob server is the leader of some table, the leader not found error will be thrown.