grpc / grpc-swift

The Swift language implementation of gRPC.
Apache License 2.0
2.01k stars 414 forks source link

Send multiple grpc requests at the same time, when one of them fails, then other requests all call back "Transport became inactive". #1421

Open jackystd opened 2 years ago

jackystd commented 2 years ago
        let (host, port) = address.convertAddress()
        let useTLS = (port == 443)
        let group = MultiThreadedEventLoopGroup(numberOfThreads: 1)
        let builder = useTLS ? ClientConnection.usingTLSBackedByNIOSSL(on: group) : ClientConnection.insecure(group: group)
        let channel = builder.connect(host: host, port: port)
        channel.connectivity.delegate = self

We upgraded from grpc0.x to grpc1.7. When the v0.x sends multiple requests at the same time, it will not affect each other when an error occurs, and the v1.7 will call back all 'Transport became inactive' after the error occurs.

Is this normal? I expect multiple requests to not affect each other.

glbrntt commented 2 years ago

It depends why they fail. If the connection has failed then it's highly likely that all in flight RPCs will fail.

For what it's worth, you should prefer using the GRPCChannelPool over ClientConnection. It is a little more resilient to failures.

jackystd commented 2 years ago

It depends why they fail. If the connection has failed then it's highly likely that all in flight RPCs will fail.

For what it's worth, you should prefer using the GRPCChannelPool over ClientConnection. It is a little more resilient to failures.

Specifically, the error we got is a status: 502, can I customize the error handling?

I don't want to one of the request back with error and cause all requests to fail until reconnection, Is there any way I can achieve this?

glbrntt commented 2 years ago

That doesn't seem right: a 502 will be mapped to a status unavailable message and should only fail the RPC that receives the 502.

Are you sure that the 502 is the cause of the connection being closed? Do you have some logs we could look at?

jackystd commented 2 years ago

That doesn't seem right: a 502 will be mapped to a status unavailable message and should only fail the RPC that receives the 502.

Are you sure that the 502 is the cause of the connection being closed? Do you have some logs we could look at?

When I receive the first response with a status code of 502, the ClientChannel will close the connection in the function: handleError(), and subsequent requests will fail(return with 'Transport became inactive') until the reconnection succeeds.

See the file: ClientTransport.swift(line:399).

 /// On-loop implementation of `handleError(_:)`.
  private func _handleError(_ error: Error) {
    self.callEventLoop.assertInEventLoop()

    switch self.state.handleError() {
    case .doNothing:
      ()

    case .propagateError:
      self.forwardErrorToInterceptors(error)
      self.failBufferedWrites(with: error)

    case .propagateErrorAndClose:
      self.forwardErrorToInterceptors(error)
      self.failBufferedWrites(with: error)
      **self.channel?.close(mode: .all, promise: nil)**
    }
  }
glbrntt commented 2 years ago

self.channel is the channel of the HTTP/2 stream. Its parent channel is for the HTTP/2 connection which should is should not be closed as a result of the stream channel closing.

If something is closing the connection channel, I don't think it's this line of code. Do you have some logs we could look at?

jackystd commented 2 years ago

self.channel is the channel of the HTTP/2 stream. Its parent channel is for the HTTP/2 connection which should is should not be closed as a result of the stream channel closing.

If something is closing the connection channel, I don't think it's this line of code. Do you have some logs we could look at?

The log is as follows:

2022-06-02 09:30:32.542107+0800 DEMO[11276:14168003] [INFO] [GRPCManager:344]: GRPC req: /proto.lucky_gift.LuckyGiftService/BigWinner
2022-06-02 09:30:32.542940+0800 DEMO[11276:14167989] [INFO] [BaseViewController:124]: [View] DEMO.AudioLiveSubSendGiftViewController viewWillAppear animated:false
2022-06-02 09:30:32.558907+0800 DEMO[11276:14168003] [INFO] [GRPCManager:344]: GRPC req: /SvrConfig.SvrConfig/GetSvrConfig
2022-06-02 09:30:32.559216+0800 DEMO[11276:14167979] [INFO] [GRPCManager:344]: GRPC req: /proto.giftlist.GiftList/GetGiftList
2022-06-02 09:30:32.559873+0800 DEMO[11276:14167986] [INFO] [GRPCManager:344]: GRPC req: /user_info.UserInfoService/CartPull
2022-06-02 09:30:32.566277+0800 DEMO[11276:14168003] [INFO] [GRPCManager:344]: GRPC req: /proto.lucky_gift.LuckyGiftService/JackpotTotal
2022-06-02 09:30:32.870930+0800 DEMO[11276:14167979] [INFO] [GRPCManager:386]: GRPC rsp:/proto.lucky_gift.LuckyGiftService/BigWinner time:0.33 host:47.241.121.1 error:Invalid HTTP response status: 502
[grpc] error happend: The operation couldn’t be completed. (GRPC.GRPCError.InvalidHTTPStatus error 1.)
[grpc] error happend: The operation couldn’t be completed. (GRPC.GRPCStatus error 1.)
2022-06-02 09:30:32.871337+0800 DEMO[11276:14167994] [INFO] [GRPCManager:380]: GRPC rsp:/proto.lucky_gift.LuckyGiftService/JackpotTotal time:0.30 host:47.241.121.1 error:Transport became inactive
[grpc] error happend: The operation couldn’t be completed. (GRPC.GRPCStatus error 1.)
2022-06-02 09:30:32.871658+0800 DEMO[11276:14167986] [INFO] [GRPCManager:380]: GRPC rsp:/SvrConfig.SvrConfig/GetSvrConfig time:0.31 host:47.241.121.1 error:Transport became inactive
[grpc] error happend: The operation couldn’t be completed. (GRPC.GRPCStatus error 1.)
2022-06-02 09:30:32.872172+0800 DEMO[11276:14167994] [INFO] [GRPCManager:380]: GRPC rsp:/user_info.UserInfoService/CartPull time:0.31 host:47.241.121.1 error:Transport became inactive
[grpc] error happend: The operation couldn’t be completed. (GRPC.GRPCStatus error 1.)
2022-06-02 09:30:32.872700+0800 DEMO[11276:14168003] [INFO] [GRPCManager:380]: GRPC rsp:/proto.giftlist.GiftList/GetGiftList time:0.31 host:47.241.121.1 error:Transport became inactive
2022-06-02 09:30:32.874071+0800 DEMO[11276:14167979] [INFO] [GRPCManager:344]: GRPC req: /SvrConfig.SvrConfig/GetSvrConfig
2022-06-02 09:30:33.192477+0800 DEMO[11276:14168003] [INFO] [GRPCManager:374]: GRPC rsp:/SvrConfig.SvrConfig/GetSvrConfig time:0.32

The error stack: screenshot-20220602-112158

the channel error happend will close, and the other requests which use this channel also will callback 'Transport became inactive', should I need to create multi channel for different service or request?

jackystd commented 2 years ago

look at?

Looking forward to your reply

glbrntt commented 2 years ago

Do you have trace logs? (You can set logger.logLevel = .trace)

Banck commented 2 years ago

Seems it's the same issue, which I had before: https://github.com/grpc/grpc-swift/issues/1341 We resolved it for now by creating new ClientConnection for every new requests

jackystd commented 2 years ago

Seems it's the same issue, which I had before: #1341 We resolved it for now by creating new ClientConnection for every new requests

Thanks, I've tried this and it works, but I don't think it's best solution. Specifically, I found that the time per request increased significantly.

glbrntt commented 2 years ago

Thanks for providing logs. It looks like metadata is missing from them though; could you provide them with metadata?

jackystd commented 2 years ago

Thanks for providing logs. It looks like metadata is missing from them though; could you provide them with metadata?

Is this the metadata you mentioned ?

 let options = CallOptions(customMetadata: self.metadata, timeLimit: .timeout(.seconds(30)),logger: logger)

We use custom metadata, and the data in it is only related to our business. I think disconnection has nothing to do with this.

glbrntt commented 2 years ago

No, I mean metadata attached to logs. The log formatter being used here is dropping metadata (which includes things like the request ID, connection ID, etc.)

jackystd commented 2 years ago

No, I mean metadata attached to logs. The log formatter being used here is dropping metadata (which includes things like the request ID, connection ID, etc.)

I uploaded the complete log to log , please check if metadata is included.

pacu commented 2 years ago

It depends why they fail. If the connection has failed then it's highly likely that all in flight RPCs will fail.

For what it's worth, you should prefer using the GRPCChannelPool over ClientConnection. It is a little more resilient to failures.

@glbrntt I'm trying this out related to the issue #1273 which has been really helpful. and the recurring errors seem to be gone.

My question is. How can I hook a connection delegate to the channel as I had with the ClientConnection?

glbrntt commented 2 years ago

There is no connection delegate for the pool.