happyfish100 / fastdfs

FastDFS is an open source high performance distributed file system (DFS). It's major functions include: file storing, file syncing and file accessing, and design for high capacity and load balance. Wechat/Weixin public account (Chinese Language): fastdfs
GNU General Public License v3.0
9.06k stars 1.99k forks source link

storage服务宕机 #719

Open zhuopz opened 3 months ago

zhuopz commented 3 months ago

fastdfs6.9.1 我的环境中有三台服务器,A B C,三台服务器均安装有tracker和storage(属于同一个group) 现在三个服务器中A中的storage服务总是莫名挂掉,目前各服务器时钟同步有一点小出入,时间在几分钟以内 A Storage 3: id = 10.132.14.9 ip_addr = 10.132.14.9 OFFLINE http domain = version = 6.9.1 join time = 2024-08-05 14:21:09 up time = total storage = 603,091 MB free storage = 574,694 MB upload priority = 10 store_path_count = 1 subdir_count_per_path = 256 storage_port = 23000 storage_http_port = 8088 current_write_path = 0 source storage id =

B Storage 1: id = 10.132.14.10 ip_addr = 10.132.14.10 ACTIVE http domain = version = 6.9.1 join time = 2024-08-05 14:21:09 up time = 2024-08-09 21:33:38 total storage = 92,115 MB free storage = 84,109 MB upload priority = 10 store_path_count = 1 subdir_count_per_path = 256 storage_port = 23000 storage_http_port = 8088 current_write_path = 0 source storage id = 10.132.14.9

C Storage 2: id = 10.132.14.109 ip_addr = 10.132.14.109 ACTIVE http domain = version = 6.9.1 join time = 2024-08-05 14:19:45 up time = 2024-08-09 21:32:07 total storage = 603,091 MB free storage = 590,775 MB upload priority = 10 store_path_count = 1 subdir_count_per_path = 256 storage_port = 23000 storage_http_port = 8088 current_write_path = 0 source storage id = 10.132.14.9

A的部分日志如下: [2024-08-09 21:37:59] INFO - file: storage_sync_func.c, line: 114, successfully connect to storage server 10.132.14.10:23000 [2024-08-09 21:37:59] INFO - file: storage_sync_func.c, line: 114, successfully connect to storage server 10.132.14.109:23000 [2024-08-09 21:37:59] DEBUG - file: sf_nio.c, line: 580, client ip: 10.132.14.109, sock: 18, recv fail, connection disconnected [2024-08-09 21:37:59] DEBUG - file: sf_nio.c, line: 580, client ip: 10.132.14.10, sock: 24, recv fail, connection disconnected [2024-08-09 21:38:08] DEBUG - file: storage_service.c, line: 3086, client ip: 10.132.14.109, storage server id: 10.132.14.109 [2024-08-09 21:38:09] DEBUG - file: storage_service.c, line: 3086, client ip: 10.132.14.10, storage server id: 10.132.14.10 [2024-08-09 22:02:38] DEBUG - file: sf_nio.c, line: 349, current stage: 4 equals to the target, skip set [2024-08-09 22:04:42] DEBUG - file: sf_nio.c, line: 467, client ip: 10.132.14.9, expect stage: 4, recv error event: 9, close connection [2024-08-09 22:07:42] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141768, recv pkg length: 10 [2024-08-09 22:27:42] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141758, recv pkg length: 10 [2024-08-09 22:32:43] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141748, recv pkg length: 10 [2024-08-09 22:57:44] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141738, recv pkg length: 10 [2024-08-09 23:02:44] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141728, recv pkg length: 10 [2024-08-09 23:22:45] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141718, recv pkg length: 10 [2024-08-09 23:27:45] WARNING - file: sf_nio.c, line: 566, client ip: 10.132.14.9, connection disconnected, expect pkg length: 141708, recv pkg length: 10 [2024-08-09 23:33:43] WARNING - file: sf_nio.c, line: 566, client ip: , connection disconnected, expect pkg length: 262144, recv pkg length: 259153 [2024-08-09 23:33:44] ERROR - file: storage_service.c, line: 3065, cmd=9, client ip: 10.132.14.10, package size 5879145732483503411 is not correct, expect length: 16 [2024-08-09 23:33:44] DEBUG - file: storage_service.c, line: 1604, close conn: #25, client ip: 10.132.14.10 [2024-08-09 23:33:45] DEBUG - file: storage_service.c, line: 3086, client ip: 10.132.14.10, storage server id: 10.132.14.10 [2024-08-09 23:53:32] DEBUG - file: fast_task_queue.c, line: 411, alloc_connections: 512, realloc 256 elements [2024-08-09 23:53:46] DEBUG - file: storage_service.c, line: 3086, client ip: 10.132.14.10, storage server id: 10.132.14.10 [2024-08-09 23:54:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:54:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:54:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:54:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:55:50] DEBUG - file: sf_nio.c, line: 467, client ip: 10.132.14.9, expect stage: 4, recv error event: 9, close connection

B的部分日志如下: [2024-08-09 23:33:44] ERROR - file: tracker_proto.c, line: 53, server: 10.132.14.9:23000, recv data fail, errno: 110, error info: Connection timed out [2024-08-09 23:33:44] ERROR - file: storage_sync.c, line: 263, fdfs_recv_response fail, result: 110 [2024-08-09 23:33:45] INFO - file: storage_sync_func.c, line: 114, successfully connect to storage server 10.132.14.9:23000 [2024-08-09 23:33:45] ERROR - file: tracker_proto.c, line: 53, server: 10.132.14.9:23000, recv data fail, errno: 107, error info: Transport endpoint is not connected [2024-08-09 23:33:45] ERROR - file: storage_sync.c, line: 733, fdfs_recv_response fail, result: 107 [2024-08-09 23:33:46] INFO - file: storage_sync_func.c, line: 114, successfully connect to storage server 10.132.14.9:23000 [2024-08-09 23:34:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:34:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:34:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:34:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:39:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:39:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:39:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:39:33] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:53:46] ERROR - file: tracker_proto.c, line: 53, server: 10.132.14.9:23000, recv data fail, errno: 110, error info: Connection timed out [2024-08-09 23:53:46] ERROR - file: tracker_proto.c, line: 204, fdfs_recv_header fail, cmd: 111, result: 110 [2024-08-09 23:53:47] INFO - file: storage_sync_func.c, line: 114, successfully connect to storage server 10.132.14.9:23000 [2024-08-09 23:56:47] ERROR - file: tracker_proto.c, line: 53, server: 10.132.14.9:23000, recv data fail, errno: 107, error info: Transport endpoint is not connected [2024-08-09 23:56:47] ERROR - file: tracker_proto.c, line: 204, fdfs_recv_header fail, cmd: 111, result: 107 [2024-08-09 23:56:48] ERROR - file: storage_sync_func.c, line: 126, connect to storage server 10.132.14.9:23000 fail, errno: 111, error info: Connection refused

C的部分日志如下: [2024-08-09 23:48:02] WARNING - file: sf_nio.c, line: 514, client ip: 10.132.14.9, req_count: 0, recv timeout [2024-08-09 23:55:15] DEBUG - file: sf_nio.c, line: 580, client ip: 10.132.14.9, sock: 21, recv fail, connection disconnected [2024-08-09 23:55:37] ERROR - file: tracker_proto.c, line: 53, server: 10.132.14.9:23000, recv data fail, errno: 107, error info: Transport endpoint is not connected [2024-08-09 23:55:37] ERROR - file: tracker_proto.c, line: 204, fdfs_recv_header fail, cmd: 111, result: 107

happyfish100 commented 3 months ago

升级到最新的 v6.12.1就没问题了。