infiniflow / infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text
https://infiniflow.org
Apache License 2.0
2.68k stars 275 forks source link

[Bug]: Directly kill the learner node, the leader and follower nodes cannot perceive it. #2227

Closed wxzkenny closed 5 days ago

wxzkenny commented 1 week ago

Is there an existing issue for the same bug?

Version or Commit ID

dfe9caf19a59b73406817d393f1a24e97dde81a4

Other environment information

Hardware: X86_64
OS type: 
 $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
 $ uname -a
Core Info: Linux ibp 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

infinity version:
Release: 0.5.0 build on 2024-11-05 13:21.45 with RelWithDebInfo mode from branch: main, commit-id: bcde5251945c88670a5f09c98c1c10d11e05affc

Actual behavior and How to reproduce it

Configure a cluster with one leader, one follower, and one learner. Directly kill the learner using the kill -9 command, and the leader and follower cannot perceive it.

When viewing on the leader and follower nodes, the learner's status remains 'alive', but the heartbeat count has not changed. 222

Expected behavior

The expectation is that after killing the learner for a certain period of time, the other nodes should see information about the learner node being down when checking the cluster status.

Additional information

The unperceived state remains unchanged until the leader node modifies the log, at which point the leader's downtime will be detected. This will result in a mismatch between the states of the leader and the follower nodes. image

No response