sofastack / sofa-bolt

SOFABolt is a lightweight, easy to use and high performance remoting framework based on Netty.
https://www.sofastack.tech/projects/sofa-bolt/
Apache License 2.0
2.42k stars 861 forks source link

com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT] #311

Closed iohao closed 1 year ago

iohao commented 1 year ago

业务场景描述

你好,有一个业务场景的流程如下:

1 需要在 external 向 internal 发送请求,并得到 internal 响应的【结果A】。

2 当 internal 接收到来自 external 的请求后,会给 external 发送一个【消息B】,这个信息是需要 external 保存的,external 接收到这个消息后给 internal 一个【响应B】(类似信息接收的回执)。

3 当 internal 接收到这个【消息B】的回执后,在给 external 最初想要的一个响应【结果A】;

业务流程图

bolt-server-client

结果描述

在这个业务场景的过程中是可以实现的,1秒内发送少量的请求时,是顺利的且不会有问题;

但在一秒发送多次时(如100或200次),会有异常 com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT]! the address is 127.0.0.1:58035;

可以帮忙看下什么问题吗;

信息流程

开始步骤-1 ---- external 向 internal 发送请求
步骤-1 ---- 请求到达 boltServer 
步骤-1 ---- 请求到达 internal

开始步骤-2 ---- internal 向 external 发送请求
步骤-2 ---- 请求到达 boltServer
步骤-2 ---- 请求到达 external

开始步骤-3 ---- external 向 internal 发送请求
步骤-3 ---- 请求到达 boltServer
步骤-3 ---- 请求到达 internal

开始步骤-4 ---- internal 向 external 发送请求
步骤-4 ---- 请求到达 boltServer
步骤-4 ---- 请求到达 external

复现步骤

复现示例代码: https://github.com/iohao/my-bolt

复现代码启动类

int loop = 100; 为 1 秒内的请求次数设置

MyBoltHelper.openLog = true; 设置为 true,可以看见请求的信息流向流程

public class MainApplication {
    public static void main(String[] args) throws InterruptedException {
        // 启动 BoltServer
        BoltServer boltServer = new BoltServer();
        run(boltServer::startup);

        // 启动 internal bolt client
        InternalBoltClient internalBoltClient = new InternalBoltClient();
        run(internalBoltClient::startup);

        // 启动 external bolt client
        ExternalBoltClient externalBoltClient = new ExternalBoltClient();
        run(externalBoltClient::startup);

        // 睡眠,等待 bolt 启动完成
        TimeUnit.SECONDS.sleep(1);

        // 设置为 true,可以看见请求的信息流向流程
        MyBoltHelper.openLog = true;
        // 请求次数设置
        int loop = 100;
        /*
         * 开始步骤-1 ---- external 向 internal 发送请求
         */
        externalBoltClient.requestInternalBoltClient(loop);

        // 统计打印
        extractedPrint();
    }
}
iohao commented 1 year ago

比较奇怪的是少量请求次数是可以的; 请求次数过多,就会出现超时; 可以帮忙分析一下原因吗;

控制台

19:36:45.387 [Bolt-default-executor-10-thread-14] ERROR b.p.BoltServerUserIdRequestSyncProcessor.handleRequest (BoltServerUserIdRequestSyncProcessor.java:57) - Rpc invocation timeout[responseCommand TIMEOUT]! the address is 127.0.0.1:63558
com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT]! the address is 127.0.0.1:63558
    at com.alipay.remoting.rpc.RpcResponseResolver.preProcess(RpcResponseResolver.java:83)
    at com.alipay.remoting.rpc.RpcResponseResolver.resolveResponseObject(RpcResponseResolver.java:54)
    at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:186)
    at com.alipay.remoting.rpc.RpcServerRemoting.invokeSync(RpcServerRemoting.java:67)
    at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:143)
    at com.alipay.remoting.rpc.RpcServer.invokeSync(RpcServer.java:638)
    at com.my.bolt.processor.BoltServerUserIdRequestSyncProcessor.handleRequest(BoltServerUserIdRequestSyncProcessor.java:46)
    at com.my.bolt.processor.BoltServerUserIdRequestSyncProcessor.handleRequest(BoltServerUserIdRequestSyncProcessor.java:32)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.dispatchToUserProcessor(RpcRequestProcessor.java:252)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.doProcess(RpcRequestProcessor.java:146)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor$ProcessTask.run(RpcRequestProcessor.java:393)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

common-default.log

2022-11-03 19:36:29,356 WARN  [AbstractConnectionFactory#259] [Thread-3] [client side] bolt netty low water mark is 32768 bytes, high water mark is 65536 bytes
2022-11-03 19:36:29,356 WARN  [AbstractConnectionFactory#259] [Thread-4] [client side] bolt netty low water mark is 32768 bytes, high water mark is 65536 bytes
2022-11-03 19:36:29,377 WARN  [AbstractRemotingServer#95] [Thread-2] Prepare to start server on port 50056 
2022-11-03 19:36:29,430 WARN  [AbstractRemotingServer#97] [Thread-2] Server started on port 50056
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-8] Wait response, request id=105 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-3] Wait response, request id=111 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-5] Wait response, request id=133 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-17] Wait response, request id=104 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-9] Wait response, request id=136 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-19] Wait response, request id=141 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-11] Wait response, request id=135 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-7] Wait response, request id=138 timeout!
2022-11-03 19:36:35,359 WARN  [BaseRemoting#113] [Bolt-default-executor-10-thread-16] Wait response, request id=49 timeout!

common-error.log

2022-11-03 19:36:35,371 ERROR [RpcRequestProcessor#246] [Bolt-default-executor-10-thread-6] AYSNC process rpc request failed in RpcRequestProcessor, id=54
java.lang.RuntimeException: com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT]! the address is 127.0.0.1:50056
    at com.my.bolt.client.MyBoltClient.invokeSync(MyBoltClient.java:66)
    at com.my.bolt.processor.InternalClientMyRequestProcessor.handleRequest(InternalClientMyRequestProcessor.java:55)
    at com.my.bolt.processor.InternalClientMyRequestProcessor.handleRequest(InternalClientMyRequestProcessor.java:34)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.dispatchToUserProcessor(RpcRequestProcessor.java:235)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor.doProcess(RpcRequestProcessor.java:146)
    at com.alipay.remoting.rpc.protocol.RpcRequestProcessor$ProcessTask.run(RpcRequestProcessor.java:393)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: com.alipay.remoting.rpc.exception.InvokeTimeoutException: Rpc invocation timeout[responseCommand TIMEOUT]! the address is 127.0.0.1:50056
    at com.alipay.remoting.rpc.RpcResponseResolver.preProcess(RpcResponseResolver.java:83)
    at com.alipay.remoting.rpc.RpcResponseResolver.resolveResponseObject(RpcResponseResolver.java:54)
    at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:186)
    at com.alipay.remoting.rpc.RpcClientRemoting.invokeSync(RpcClientRemoting.java:72)
    at com.alipay.remoting.rpc.RpcRemoting.invokeSync(RpcRemoting.java:143)
    at com.alipay.remoting.rpc.RpcClient.invokeSync(RpcClient.java:219)
    at com.my.bolt.client.MyBoltClient.invokeSync(MyBoltClient.java:64)
    ... 8 common frames omitted
iohao commented 1 year ago

此问题会在发送端发送消息密集时触发,大约连续发送 50 条

发送端伪代码如下,往 boltClient 发送

        byte[] bytes = ...;
        for (int i = 0; i < 50; i++) {
            webSocketClient.send(bytes);
        }
JervyShi commented 1 year ago

You could override XXServerProcessor#getExecutor() to set the user thread pool. It can separate IO ThreadPool and User ThreadPool. Then your server can handle more requests in the same time.

iohao commented 1 year ago

好的,感谢!