trpc-group / trpc-go

A pluggable, high-performance RPC framework written in golang
Other
742 stars 85 forks source link

question: how to solve the problem that the program show error code :141 and 171 #165

Closed jincurry closed 2 months ago

jincurry commented 2 months ago

Preliminary Research

Question

What's the reason why client side show error code :141, it show the connection is closed, and how to solve the problem?

When client request server, but the server will spent more than 1 minute to handle some data, the client side will occur error, and the log shows type:framework, code:141, msg:tcp client transport ReadFrame, cost:49.999809717s, caused by conn is closed when using tnet, and after change the transport method from tnet to go-net, the error code change to 171, and error message show : type:framework, code:171, msg:tcp client transport ReadFrame: EOF

Additional Information

I will show some code to reappear the question:

client side:

func main() {
    c := pb.NewGreeterClientProxy(client.WithTarget("ip://127.0.0.1:8000"))
    rsp, err := c.Hello(trpc.BackgroundContext(), &pb.HelloRequest{Msg: "world"})
    if err != nil {
        log.Error(err)
        return
    }
    log.Info(rsp.Msg)
}

server side:

func main() {
    s := trpc.NewServer()
    pb.RegisterGreeterService(s, &Greeter{})
    if err := s.Serve(); err != nil {
        log.Error(err)
    }
}

type Greeter struct{}

func (g Greeter) Hello(ctx context.Context, req *pb.HelloRequest) (*pb.HelloReply, error) {
    log.Infof("got hello request: %s", req.Msg)
    time.Sleep(1 * time.Minute)
    log.Info("sleep enough time")
    return &pb.HelloReply{Msg: "Hello " + req.Msg + "!"}, nil
}

pb file:

syntax = "proto3";

package trpc.hello;
option go_package="xagent/cmd/rpc/pb";

service Greeter {
  rpc Hello (HelloRequest) returns (HelloReply) {}
}

message HelloRequest {
  string msg = 1;
}

message HelloReply {
  string msg = 1;
}
WineChord commented 2 months ago

image

The server has a default idle timeout of 60 seconds. If this timeout is reached without any activity, the server will close the connection. As a result, the client will receive an error stating, "The connection is closed."

jincurry commented 2 months ago

Thank you for your reply. I will try adjusting the idle timeout value to test. By the way, if a server is executing a long-duration task, how should I set the timeout and idle timeout? Is there a way to prevent timeouts and keep the connection alive indefinitely?

WineChord commented 2 months ago

I've added a section to the doc in https://github.com/trpc-group/trpc-go/pull/166. You can use idletime: -1 to disable idle timeout:

server:
  service:
    - name: trpc.server.service.Method
      network: tcp
      protocol: trpc
      idletime: 60000 # The unit is milliseconds. Setting it to -1 means there is no idle timeout (setting it to 0 will still default to the 60s by the framework)
jincurry commented 2 months ago

I have noticed your commit, i will set the idle timeout as -1 for test , thanks for your reply again.

WineChord commented 2 months ago

"if a server is executing a long-duration task"

Another timeout to be aware of is the service timeout. This timeout is embedded in the ctx context.Context input argument. However, you must select ctx.Done() to catch the context timeout; otherwise, the context timeout will not take effect.

The corresponding document: https://github.com/trpc-group/trpc-go/blob/main/docs/user_guide/timeout_control.md#message-timeout.

jincurry commented 2 months ago

I'm sorry to bother you again. I tested setting the idle time to -1, but the client side did not wait for the server's reply and failed very quickly instead. Is there any suggestion?

there is my trpc_go.yaml config for server.

server:
  service:
    - name: trpc.hello
      ip: 127.0.0.1
      port: 8000
      idletime: -1
      timeout: 60000

and the client and server side output as below: image image

WineChord commented 2 months ago

@jincurry Hi, I found that tnet does not handle negative idle timeout properly. Here is the fix: https://github.com/trpc-group/trpc-go/pull/169/

jincurry commented 2 months ago

thanks for your help, I will test later

jincurry commented 2 months ago

@WineChord hi,there has another thing that want you know about the result I test the code. after I change the server's idletime and timeout as 600000ms, I suppose that the client will get correct result from server, because the server's idletime is longer than the time for handling data. but I still get the same error. client/main.go:17 type:framework, code:141, msg:tcp client transport ReadFrame, cost:49.999645441s, caused by conn is closed.

another thing should noticed: I change the client side code, use go-net framework, this time, client got correct response. client side code:

func main() {
    c := pb.NewGreeterClientProxy(client.WithTarget("ip://127.0.0.1:8000"),
        client.WithTransport(transport.GetClientTransport("go-net")))
    rsp, err := c.Hello(trpc.BackgroundContext(), &pb.HelloRequest{Msg: "world"})
    if err != nil {
        log.Error(err)
        return
    }
    log.Info(rsp.Msg)
}

image

any suggestion for this?

WineChord commented 2 months ago

This is due to the default 50-second timeout for the client connection pool:

https://github.com/trpc-group/trpc-go/blob/main/docs/user_guide/client/connection_mode.md#connection-pool-1

image

The reason why go-net does not result in an error is that it only checks the connection idle timeout when the connection is truly idle (i.e., when it's in the connection pool's idle list). However, tnet does not maintain such a list. The check for the idle timeout timer is updated upon every read/write event on the connection, whether the connection is in use or not. In this sense, tnet's connection idle timeout also acts as a read/write timeout.

For the tnet client transport, an additional idle timeout parameter can be provided to extend the limit:


import (
    "trpc.group/trpc-go/trpc-go/pool/connpool"
    tnettrans "trpc.group/trpc-go/trpc-go/transport/tnet"
)

func init() {
    tnettrans.DefaultConnPool = connpool.NewConnectionPool(
          connpool.WithDialFunc(tnettrans.Dial),
          connpool.WithIdleTimeout(0), // Use 0 to disable the idle timeout for the client.
          connpool.WithHealthChecker(tnettrans.HealthChecker),
      )
}
jincurry commented 2 months ago

it works. thanks for your reply.

WineChord commented 2 months ago

Updated in the doc: https://github.com/trpc-group/trpc-go/pull/170