vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability
https://nebula-graph.io
Apache License 2.0
10.86k stars 1.21k forks source link

当前库里面17亿数据,每周这17亿数据都会更新,现在graphd节点会毫无征兆的死掉 #5981

Open zhoushew opened 2 days ago

zhoushew commented 2 days ago

nebula 版本:3.8.0

部署方式:分布式

安装方式:TAR包安装

是否上生产环境:Y

硬件信息

存储:1T(非SSD) CPU:32C 内存:256G 部署方案: 参考官方部署方案 image

image image 793×438 10.7 KB 问题的具体描述 程序执行一段时间后出现异常: Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.vesoft.nebula.client.graph.SessionPool]: Factory method sessionPool’ threw exception; nested exception is java.lang RuntimeException: create session failed. Caused by: java.lang.RuntimeException: create session failed. at com.vesoft.nebula.client.graph.SessionPool.init (SessionPool.java:127) at com.vesoft.nebula.client.graph.SessionPool.(SessionPool.java: 73) Caused by: com.vesoft.nebula.client.graph.exception.IOErrorException: java.net.ConnectException: Connection refused (Connection refused) at com.vesoft.nebula.client.graph.net.SyncConnection.open (SyncConnection.java:137) at com.vesoft.nebula.client.graph.SessionPool.createSessionObject (SessionPool.java: 395) at com.vesoft.nebula.client-graph.SessionPool.init (SessionPool.java:(SessionPool.java: 123)

查看图数据库状态时 [INFO] nebula-metad(7458486): Running as 1460855, Listening on 9559 [INFO] nebula-graphd(7458486): Exited [INFO] nebula-storaged(7458486): Running as 1460989, Listening on 9779

启动graphd后程序恢复正常,但是一段时间(一天或几天)后问题又出现了

相关的 meta / storage / graph info 日志信息 metad日志: E20241029 16:50:36.354398 1461238 SessionManagerProcessor.cpp:215] Remove session key failed, error code: -17 graph日志: Create session for userName: cattsoft, ip: xx.xx.xx.xx failed: Insert session to local cache failed. E20241029 16:48:46.746682 1461032 GraphService.cpp:113] Create session for userName: cattsoft, ip: xx.xx.xx.xx failed: Insert session to local cache failed.

storaged日志: Log file created at: 2024/10/28 18:20:08 Running on machine: c-02 Running duration (h:mm:ss): 0:00:00 Log line format: [IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line] msg E20241028 18:20:08.033663 1461113 AddEdgesProcessor.cpp:359] Error! ret = E_LEADER_CHANGED, spaceId 2 E20241028 18:20:17.776060 1461113 AddEdgesProcessor.cpp:359] Error! ret = E_LEADER_CHANGED, spaceId 2 E20241028 18:21:11.758066 1461115 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type nebula::HostAddr. E20241028 18:23:13.663666 1461590 Serializer.h:43] Thrift serialization is only defined for structs and unions, not containers thereof. Attemping to deserialize a value of type nebula::Value.

配置相关说明 1、graph配置使用的默认值 session_idle_timeout_secs=28800 client_idle_timeout_secs=28800 2、程序配置使用的默认值 timeout = 0 cleanTime = 3600;