vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability
https://nebula-graph.io
Apache License 2.0
10.6k stars 1.18k forks source link

storage raft election leader always failed and graph query storage rpc timeout when rocksdb write stall #5358

Open tangyuanzhang opened 1 year ago

tangyuanzhang commented 1 year ago

Bg: storage executes ingest, rocksdb triggers write stall, storage will always report leader election failure, query storage, it will report rpc timeout reason:GraphStorageServiceHandler, RaftexService,RaftPart shares the cpu threads pool image image image

The first step of RaftPart's heartbeat will write rocksdb, if now rocksdb write stall , a host has a large number of part leaders, which will cause all cpu thread pools to be blocked in write rocksdb (assuming the write stall exceeds 60s,ex:ingest), at this time storage raft will fail, and storage query will also fail I think raftPart should be created separately without affecting the call of rpc image

HarrisChu commented 1 year ago

https://discuss.nebula-graph.com.cn/t/topic/12133/22