sogou / srpc

RPC framework based on C++ Workflow. Supports SRPC, Baidu bRPC, Tencent tRPC, thrift protocols.
Apache License 2.0
1.93k stars 382 forks source link

in sync mode,It took a long time to get data from server,how can I debug? #388

Closed ouclbc closed 2 months ago

ouclbc commented 3 months ago

use sync mode, normally it spend 14ms get data from server ,but sometimes it spend 100ms or more to get data from server, so how can I debug? ::XrPackage::PoseRequest sync_req; ::XrPackage::PoseResponse sync_resp; RPCSyncContext sync_ctx;

Barenboim commented 3 months ago

What is the max size of your req/resp? And the RTT between client/server?

ouclbc commented 3 months ago

once every 1 millisecond. only one client

ouclbc commented 3 months ago

RTT about 2ms

Barenboim commented 3 months ago

Could you also try task mode and compare the response time with sync mode?

ouclbc commented 3 months ago

how to set task mode?

holmes1412 commented 3 months ago

You can refer to the usage here: https://github.com/sogou/srpc/blob/master/tutorial/tutorial-09-client_task.cc#L32


void callback(::XrPackage::PoseResponse *resp, RPCSyncContext *ctx)
{
    // process response
}

int main()
{

    ::XrPackage::PoseRequest req;
    // ... set req

    auto *rpc_task = client.create_XXX_task(callback);
    rpc_task->serialize_input(&req);

    rpc_task->start();
    // please make sure your main thread will not exit here
}

Please check your generated file xxxx.srpc.h to find the definition of create_XXX_task().

ouclbc commented 3 months ago

use task mode to get pose, it also spend 100ms or more to get data from server (the client and server are both android device)

holmes1412 commented 3 months ago

Hi, I think there are some other ways to debug:

  1. Usage. Is your code simple enough to only send this task in parallel (and not doing other things, to avoid other influences). And then we can check if the latency is stable to 10ms+ or still unstable from 10ms+ to 100ms.

  2. Data size. As we know, protobuf serialization is very slow for large data or data with map(partly because map has discontinuous memory). So we can check if the slow latency is caused by large data.

  3. Latency calculation. You may use the internal trace module to monitor the latency. It begins counting when the task is really start and ends after receive and deserialization. So I think it is more accurate. You may check if this time is also nearly 100ms.

Check the usage of trace module as follow:

#include "srpc/rpc_trace_filter.h"
int main()
{
    ...
    RPCTraceDefault span_log; // this plugin will print the trace info on the screen
    client.add_filter(&span_log);
    ...
}
ouclbc commented 3 months ago

thanks ,I will try

ouclbc commented 3 months ago

only send pose,the max time value is 16ms,but not stable. where can I set dscp (socket) in workflow? is it in nonblock_connect method in Communicator? int dscp = 0x2E; if (setsockopt(mSockfd, IPPROTO_IP, IP_TOS, &dscp, sizeof(dscp)) != 0)

holmes1412 commented 3 months ago
  1. What do you mean ' only send pose' ? You mean sending doesn't take much time, so the problem is with connecting or receiving?
  2. Inside Communicator, it is nonblock_connect.
  3. You cannot setsockopt.
ouclbc commented 3 months ago

1.only send pose that mean simple enough to only send this task in parallel,here are also data points with durations over 100 milliseconds, but they are relatively few in comparison. 2.I want to set priority in socket,so it can send data fast.is it support(dscp)? 3.when I use UDP,it seems something wrong,so is it support UDP in old version workflow(last year) 4.use tcpdump,and found send data and receive data very quickly( under 20ms ),but application receive sometimes cost 100ms

holmes1412 commented 3 months ago
  1. It seems simple enough when you only send pose. There are some point you can check:

    • What's the data size? (The time for protobuf to deserialize may be long.)
    • What is the CPU utilization at that time?
  2. About setting the socket priority, you cannot get the fd so you cannot setsockopt(). If your dependencies are source code, maybe you can add the code in Communicator.cc . But as you mentioned you already tcpdump and the network is really quick, I think it is not about the network. Maybe you can also try to run client and server on the same machine to avoid the unstable network and we can see whether the long latency is caused by the framework.

  3. You can send UDP as client side in older version Workflow. ( BTW, you can post the ERROR when you use UDP. I would like to solve it.)

ouclbc commented 2 months ago

thank for your reply,so how can I set UDP,since the constuctor has no type set as followed

struct RPCClientParams { RPCTaskParams task_params; //host + port + is_ssl std::string host; unsigned short port; bool is_ssl; //or URL std::string url; int callee_timeout; std::string caller; };

Barenboim commented 2 months ago

UDP transmission was supported in the lastest release, but only available for Unix system. https://github.com/sogou/srpc/blob/627f69c8d34b83067d945c7085e99f22ff4fee7d/src/rpc_options.h#L39 When the 'transport_type' field is TT_UDP, the client will use UDP.

Barenboim commented 2 months ago

Please update to the lastest releases, workflow and SRPC.

ouclbc commented 2 months ago

目前用自己写的通信库看UDP数据传输还是挺稳定的,所以暂时不用srpc方案了。谢谢!

Barenboim commented 2 months ago

如果可以,也麻烦帮我们试一下如果只用workflow的UDP,是否也有问题。

#include <stdio.h>
#include <string>
#include <iostream>
#include "workflow/WFGlobal.h"
#include "workflow/WFFacilities.h"
#include "workflow/TLVMessage.h"
#include "workflow/WFTaskFactory.h"
#include "workflow/WFServer.h"

using namespace protocol;

using WFTLVServer = WFServer<TLVRequest, TLVResponse>;
using WFTLVTask = WFNetworkTask<TLVRequest, TLVResponse>;
using tlv_callback_t = std::function<void (WFTLVTask *)>;

WFTLVTask *create_tlv_task(const char *host, unsigned short port, tlv_callback_t callback)
{
    auto *task = WFNetworkTaskFactory<TLVRequest, TLVResponse>::create_client_task(
                                       TT_UDP, host, port, 0, std::move(callback));    // 创建UDP传输的client任务
    task->set_keep_alive(60 * 1000);
    return task;
}

int main()
{
    struct WFServerParams params = SERVER_PARAMS_DEFAULT;
    params.transport_type = TT_UDP;      // 这里把server的传输协议改为UDP

    WFTLVServer server(&params, [](WFTLVTask *task) {
        *task->get_resp() = std::move(*task->get_req());
    });

    if (server.start(8888) != 0) {
        perror("server.start");
        exit(1);
    }

    auto&& create = [](WFRepeaterTask *)->SubTask * {
        std::string string;
        printf("Input string (Ctrl-D to exit): ");
        std::cin >> string;
        if (string.empty())
            return NULL;

        auto *task = create_tlv_task("127.0.0.1", 8888, [](WFTLVTask *task) {
            if (task->get_state() == WFT_STATE_SUCCESS)
                printf("Server Response: %s\n", task->get_resp()->get_value()->c_str());
            else {
                const char *str = WFGlobal::get_error_string(task->get_state(), task->get_error());
                fprintf(stderr, "Error: %s\n", str);
            }
        });

        task->get_req()->set_value(std::move(string));
        return task;
    };

    WFFacilities::WaitGroup wait_group(1);
    WFRepeaterTask *repeater = WFTaskFactory::create_repeater_task(std::move(create), nullptr);
    Workflow::start_series_work(repeater, [&wait_group](const SeriesWork *) {
        wait_group.done();
    });

    wait_group.wait();
    server.stop();
    return 0;
}