jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
21.03k stars 2.22k forks source link

Failed to set TCP_USER_TIMEOUT #5817

Closed SpacedDot closed 1 year ago

SpacedDot commented 1 year ago

Describe your proposal/problem I ran the demo of translate_executor and got this problem. I do some search and realize it's a TCP bind problem. But I can not find the solution.


Environment ubuntu 18.04 (jina) user@ubuntu:~/project/cv$ python deployment.py

translate_executor.py

from docarray import DocumentArray
from jina import Executor, requests
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

class Translator(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.tokenizer = AutoTokenizer.from_pretrained(
            "facebook/mbart-large-50-many-to-many-mmt", src_lang="fr_XX"
        )
        self.model = AutoModelForSeq2SeqLM.from_pretrained(
            "facebook/mbart-large-50-many-to-many-mmt"
        )

    @requests
    def translate(self, docs: DocumentArray, **kwargs):
        for doc in docs:
            doc.text = self._translate(doc.text)

    def _translate(self, text):
        encoded_en = self.tokenizer(text, return_tensors="pt")
        generated_tokens = self.model.generate(
            **encoded_en, forced_bos_token_id=self.tokenizer.lang_code_to_id["en_XX"]
        )
        return self.tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[
            0
        ]

deployment.py

from jina import Deployment
from translate_executor import Translator

with Deployment(uses=Translator, timeout_ready=-1) as dep:
    dep.block()

Screenshots 1682047974518

JoanFM commented 1 year ago

Hello, it feels like just a WARNING, can you still process Documents?

SpacedDot commented 1 year ago

Yes. The document could be processed. Just wonder why this warn/error happened. Doing some search but seems nobody report this issue.

SpacedDot commented 1 year ago

1682060359513 And everytime I process it, this warn/error appear once. Seems not affect the process. Just annoying....

SpacedDot commented 1 year ago

This seems to be a grpc problem. I hope someone could give me a clue what is this about. I am afraid this will lead to another future problem.

JoanFM commented 1 year ago

what is the grpcio version you are using?

JoanFM commented 1 year ago

This is the code doing this in grpc:

// Set TCP_USER_TIMEOUT
grpc_error_handle grpc_set_socket_tcp_user_timeout(
    int fd, const grpc_core::PosixTcpOptions& options, bool is_client) {
  // Use conditionally-important parameter to avoid warning
  (void)fd;
  (void)is_client;
  extern grpc_core::TraceFlag grpc_tcp_trace;
  if (g_socket_supports_tcp_user_timeout.load() >= 0) {
    bool enable;
    int timeout;
    if (is_client) {
      enable = g_default_client_tcp_user_timeout_enabled;
      timeout = g_default_client_tcp_user_timeout_ms;
    } else {
      enable = g_default_server_tcp_user_timeout_enabled;
      timeout = g_default_server_tcp_user_timeout_ms;
    }
    int value = options.keep_alive_time_ms;
    if (value > 0) {
      enable = value != INT_MAX;
    }
    value = options.keep_alive_timeout_ms;
    if (value > 0) {
      timeout = value;
    }
    if (enable) {
      int newval;
      socklen_t len = sizeof(newval);
      // If this is the first time to use TCP_USER_TIMEOUT, try to check
      // if it is available.
      if (g_socket_supports_tcp_user_timeout.load() == 0) {
        if (0 != getsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &newval, &len)) {
          gpr_log(GPR_INFO,
                  "TCP_USER_TIMEOUT is not available. TCP_USER_TIMEOUT won't "
                  "be used thereafter");
          g_socket_supports_tcp_user_timeout.store(-1);
        } else {
          gpr_log(GPR_INFO,
                  "TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be "
                  "used thereafter");
          g_socket_supports_tcp_user_timeout.store(1);
        }
      }
      if (g_socket_supports_tcp_user_timeout.load() > 0) {
        if (GRPC_TRACE_FLAG_ENABLED(grpc_tcp_trace)) {
          gpr_log(GPR_INFO, "Enabling TCP_USER_TIMEOUT with a timeout of %d ms",
                  timeout);
        }
        if (0 != setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &timeout,
                            sizeof(timeout))) {
          gpr_log(GPR_ERROR, "setsockopt(TCP_USER_TIMEOUT) %s",
                  grpc_core::StrError(errno).c_str());
          return absl::OkStatus();
        }
        if (0 != getsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &newval, &len)) {
          gpr_log(GPR_ERROR, "getsockopt(TCP_USER_TIMEOUT) %s",
                  grpc_core::StrError(errno).c_str());
          return absl::OkStatus();
        }
        if (newval != timeout) {
          // Do not fail on failing to set TCP_USER_TIMEOUT for now.
          gpr_log(GPR_ERROR, "Failed to set TCP_USER_TIMEOUT");
          return absl::OkStatus();
        }
      }
    }
  } else {
    if (GRPC_TRACE_FLAG_ENABLED(grpc_tcp_trace)) {
      gpr_log(GPR_INFO, "TCP_USER_TIMEOUT not supported for this platform");
    }
  }
  return absl::OkStatus();
}
JoanFM commented 1 year ago

Maybe you can open an issue in their repo, to see if they can give more insights

SpacedDot commented 1 year ago

what is the grpcio version you are using?

grpcio 1.47.5 grpcio-health-checking 1.47.5 grpcio-reflection 1.47.5

SpacedDot commented 1 year ago

Maybe you can open an issue in their repo, to see if they can give more insights

I run the grpc example async_greeter_server.py. It works fine. No error report. The code is as follows:

import asyncio
import logging

import grpc
import helloworld_pb2
import helloworld_pb2_grpc

class Greeter(helloworld_pb2_grpc.GreeterServicer):

    async def SayHello(
            self, request: helloworld_pb2.HelloRequest,
            context: grpc.aio.ServicerContext) -> helloworld_pb2.HelloReply:
        return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name)

async def serve() -> None:
    server = grpc.aio.server()
    helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    # listen_addr = '[::]:50051'
    # server.add_insecure_port(listen_addr)
    # logging.info("Starting server on %s", listen_addr)
    server.add_insecure_port("0.0.0.0:53767")
    await server.start()
    await server.wait_for_termination()

if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO)
    asyncio.run(serve())

This seems an error caused by GRPCServer in jina.serve.runtimes.servers.grpc.GRPCServer. I try to locate the problem but failed.

Can you reproduce this problem? Or it's just a problem occured to my enviroment?

Really need some help.

SpacedDot commented 1 year ago

By the way, my system is Ubuntu 18.04 and the kernel is 4.15.0-208-generic.

JoanFM commented 1 year ago

In my system, I do not see this warning actually and I use also the same grpcio version as yours

SpacedDot commented 1 year ago

In my system, I do not see this warning actually and I use also the same grpcio version as yours

May I ask what's your system?

SpacedDot commented 1 year ago

It seems a problem of Ubuntu 18.04. The warning don't appear using Ubuntu 20.04.

But still can not figure out why this happens.

SpacedDot commented 1 year ago

I will move on. If anyone have a clue of this problem, please be kind to leave a message.

JoanFM commented 1 year ago

Maybe u can enable some GRPC logs to see what is trying the code to do to set that value.

JoanFM commented 1 year ago

can u set the GRPC_VERBOSITY and/or the GRPC_TRACE env var?

SpacedDot commented 1 year ago

Yes. The log as this:

(jina) user@ubuntu:~/project/cv/jina_ex/translate$ python deployment.py 
<_io.StringIO object at 0x7f82d603a940>
{'jtype': 'GRPCGateway'}
<_io.StringIO object at 0x7f82d604ddc0>
{'jtype': 'Translator'}
I0428 03:53:45.622152914   37444 ev_epoll1_linux.cc:121]     grpc epoll fd: 9
D0428 03:53:45.622215925   37444 ev_posix.cc:171]            Using polling engine: epoll1
D0428 03:53:45.622280134   37444 lb_policy_registry.cc:49]   registering LB policy factory for "grpclb"
D0428 03:53:45.622292784   37444 lb_policy_registry.cc:49]   registering LB policy factory for "rls_experimental"
D0428 03:53:45.622299192   37444 lb_policy_registry.cc:49]   registering LB policy factory for "priority_experimental"
D0428 03:53:45.622307220   37444 lb_policy_registry.cc:49]   registering LB policy factory for "weighted_target_experimental"
D0428 03:53:45.622312572   37444 lb_policy_registry.cc:49]   registering LB policy factory for "pick_first"
D0428 03:53:45.622320004   37444 lb_policy_registry.cc:49]   registering LB policy factory for "round_robin"
D0428 03:53:45.622322560   37444 lb_policy_registry.cc:49]   registering LB policy factory for "ring_hash_experimental"
D0428 03:53:45.622329374   37444 dns_resolver_ares.cc:545]   Using ares dns resolver
D0428 03:53:45.622362305   37444 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0428 03:53:45.622368841   37444 lb_policy_registry.cc:49]   registering LB policy factory for "cds_experimental"
D0428 03:53:45.622375243   37444 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_impl_experimental"
D0428 03:53:45.622382783   37444 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_resolver_experimental"
D0428 03:53:45.622390121   37444 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_manager_experimental"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 9999), ('grpc.keepalive_timeout_ms', 4999), ('grpc.keepalive_permit_without_calls', True), ('grpc.http1.max_pings_without_data', 0), ('grpc.http1.min_time_between_pings_ms', 10000), ('grpc.http1.min_ping_interval_without_data_ms', 5000)]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
None
setup server 0 
setup server 1
********dry_run
setup server 2
bind_address:
0.0.0.0:62381
3333333333333333333
I0428 03:53:45.624067856   37444 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
E0428 03:53:45.624077870   37444 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
INFO   gateway@37444 start server bound to 0.0.0.0:62381                                                                                                                               [04/28/23 03:53:45]
setup server 3 
jina.JinaRPC
jina.JinaSingleDataRequestRPC
jina.JinaDataRequestRPC
jina.JinaGatewayDryRunRPC
jina.JinaDiscoverEndpointsRPC
jina.JinaInfoRPC
grpc.reflection.v1alpha.ServerReflection
setup server 4 
E0428 03:53:45.628781836   37444 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
I0428 03:53:45.629129499   37446 subchannel.cc:956]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connect failed ({"created":"@1682654025.628874974","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"src/core/lib/iomgr/tcp_client_posix.cc","file_line":201,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:0.0.0.0:57938"}), backing off for 999 ms
I0428 03:53:46.628010701   37448 subchannel.cc:902]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connection attempt requested while backoff timer was pending, retrying now
E0428 03:53:46.628165087   37448 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
I0428 03:53:46.628469801   37446 subchannel.cc:956]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connect failed ({"created":"@1682654026.628300495","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"src/core/lib/iomgr/tcp_client_posix.cc","file_line":201,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:0.0.0.0:57938"}), backing off for 1302 ms
I0428 03:53:47.929946525   37460 subchannel.cc:902]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connection attempt requested while backoff timer was pending, retrying now
E0428 03:53:47.930090204   37460 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
I0428 03:53:47.930392711   37446 subchannel.cc:956]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connect failed ({"created":"@1682654027.930216066","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"src/core/lib/iomgr/tcp_client_posix.cc","file_line":201,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:0.0.0.0:57938"}), backing off for 2838 ms
I0428 03:53:50.767919122   37460 subchannel.cc:902]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connection attempt requested while backoff timer was pending, retrying now
E0428 03:53:50.768040923   37460 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
I0428 03:53:50.768338278   37446 subchannel.cc:956]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connect failed ({"created":"@1682654030.768148332","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"src/core/lib/iomgr/tcp_client_posix.cc","file_line":201,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:0.0.0.0:57938"}), backing off for 3608 ms
I0428 03:53:54.375923828   37448 subchannel.cc:902]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connection attempt requested while backoff timer was pending, retrying now
E0428 03:53:54.376054315   37448 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
I0428 03:53:54.376358907   37446 subchannel.cc:956]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connect failed ({"created":"@1682654034.376166292","description":"Failed to connect to remote host: Connection refused","errno":111,"file":"src/core/lib/iomgr/tcp_client_posix.cc","file_line":201,"os_error":"Connection refused","syscall":"connect","target_address":"ipv4:0.0.0.0:57938"}), backing off for 6243 ms
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 9999), ('grpc.keepalive_timeout_ms', 4999), ('grpc.keepalive_permit_without_calls', True), ('grpc.http1.max_pings_without_data', 0), ('grpc.http1.min_time_between_pings_ms', 10000), ('grpc.http1.min_ping_interval_without_data_ms', 5000)]
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
None
setup server 0 
I0428 03:53:57.607954727   37445 ev_epoll1_linux.cc:121]     grpc epoll fd: 11
D0428 03:53:57.608033967   37445 ev_posix.cc:171]            Using polling engine: epoll1
D0428 03:53:57.608069875   37445 lb_policy_registry.cc:49]   registering LB policy factory for "grpclb"
D0428 03:53:57.608076362   37445 lb_policy_registry.cc:49]   registering LB policy factory for "rls_experimental"
D0428 03:53:57.608082212   37445 lb_policy_registry.cc:49]   registering LB policy factory for "priority_experimental"
D0428 03:53:57.608087714   37445 lb_policy_registry.cc:49]   registering LB policy factory for "weighted_target_experimental"
D0428 03:53:57.608091284   37445 lb_policy_registry.cc:49]   registering LB policy factory for "pick_first"
D0428 03:53:57.608095089   37445 lb_policy_registry.cc:49]   registering LB policy factory for "round_robin"
D0428 03:53:57.608098749   37445 lb_policy_registry.cc:49]   registering LB policy factory for "ring_hash_experimental"
D0428 03:53:57.608107638   37445 dns_resolver_ares.cc:545]   Using ares dns resolver
D0428 03:53:57.608133112   37445 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0428 03:53:57.608137321   37445 lb_policy_registry.cc:49]   registering LB policy factory for "cds_experimental"
D0428 03:53:57.608142243   37445 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_impl_experimental"
D0428 03:53:57.608147993   37445 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_resolver_experimental"
D0428 03:53:57.608151774   37445 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_manager_experimental"
setup server 1
********endpoint_discovery
********process_data
setup server 2
bind_address:
0.0.0.0:57938
3333333333333333333
I0428 03:53:57.609336341   37445 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
E0428 03:53:57.609345622   37445 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
INFO   executor/rep-0@37445 start server bound to 0.0.0.0:57938                                                                                                                        [04/28/23 03:53:57]
setup server 3 
jina.JinaRPC
jina.JinaSingleDataRequestRPC
jina.JinaDataRequestRPC
jina.JinaGatewayDryRunRPC
jina.JinaDiscoverEndpointsRPC
jina.JinaInfoRPC
grpc.reflection.v1alpha.ServerReflection
setup server 4 
I0428 03:53:57.620342878   37378 ev_epoll1_linux.cc:121]     grpc epoll fd: 10
D0428 03:53:57.620476091   37378 ev_posix.cc:171]            Using polling engine: epoll1
D0428 03:53:57.620512790   37378 lb_policy_registry.cc:49]   registering LB policy factory for "grpclb"
D0428 03:53:57.620522522   37378 lb_policy_registry.cc:49]   registering LB policy factory for "rls_experimental"
D0428 03:53:57.620532080   37378 lb_policy_registry.cc:49]   registering LB policy factory for "priority_experimental"
D0428 03:53:57.620540774   37378 lb_policy_registry.cc:49]   registering LB policy factory for "weighted_target_experimental"
D0428 03:53:57.620547022   37378 lb_policy_registry.cc:49]   registering LB policy factory for "pick_first"
D0428 03:53:57.620553763   37378 lb_policy_registry.cc:49]   registering LB policy factory for "round_robin"
D0428 03:53:57.620559633   37378 lb_policy_registry.cc:49]   registering LB policy factory for "ring_hash_experimental"
D0428 03:53:57.620568473   37378 dns_resolver_ares.cc:545]   Using ares dns resolver
D0428 03:53:57.620600287   37378 certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0428 03:53:57.620606290   37378 lb_policy_registry.cc:49]   registering LB policy factory for "cds_experimental"
D0428 03:53:57.620613876   37378 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_impl_experimental"
D0428 03:53:57.620623912   37378 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_resolver_experimental"
D0428 03:53:57.620628935   37378 lb_policy_registry.cc:49]   registering LB policy factory for "xds_cluster_manager_experimental"
I0428 03:53:57.622092843   37378 socket_utils_common_posix.cc:353] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter
E0428 03:53:57.622108138   37378 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
D0428 03:53:57.622264198   37569 tcp_posix.cc:1834]          cannot set inq fd=14 errno=92
D0428 03:53:57.622271333   37564 tcp_posix.cc:1834]          cannot set inq fd=17 errno=92
D0428 03:53:57.631853185   37378 init.cc:220]                grpc_shutdown starts clean-up now
I0428 03:54:00.618930847   37448 subchannel.cc:902]          subchannel 0x26bb410 {address=ipv4:0.0.0.0:57938, args={grpc.client_channel_factory=0x357adf0, grpc.default_authority=0.0.0.0:57938, grpc.http1.max_pings_without_data=0, grpc.http1.min_ping_interval_without_data_ms=5000, grpc.http1.min_time_between_pings_ms=10000, grpc.internal.channel_credentials=0x3484a30, grpc.internal.security_connector=0x32b5d50, grpc.internal.subchannel_pool=0x2feb510, grpc.keepalive_permit_without_calls=1, grpc.keepalive_time_ms=9999, grpc.keepalive_timeout_ms=4999, grpc.max_receive_message_length=-1, grpc.max_send_message_length=-1, grpc.primary_user_agent=grpc-python-asyncio/1.47.5, grpc.resource_quota=0x2b27970, grpc.server_uri=dns:///0.0.0.0:57938}}: connection attempt requested while backoff timer was pending, retrying now
E0428 03:54:00.619063222   37448 socket_utils_common_posix.cc:377] Failed to set TCP_USER_TIMEOUT
D0428 03:54:00.619238024   37564 tcp_posix.cc:1834]          cannot set inq fd=18 errno=92
D0428 03:54:00.619265852   37449 tcp_posix.cc:1834]          cannot set inq fd=15 errno=92

Any thought about this?

JoanFM commented 1 year ago

did u use GRPC_TRACE=tcp

JoanFM commented 1 year ago

Analyzing the code, what I understand it does is:

Try to set timeout value for this socket option to one of these 3 values:

Then tries to read the value of that socket option into newval which loads successfully, and then sets the option successfully (returns 0) to timeout. And then there is a check to see if the old option (newval) is the same as the newly set (timeout).

I think it says failed, but it is not true, it just cnaged from a different value. I will try to open a PR trying to change the message.

JoanFM commented 1 year ago

I opened this PR on grpc https://github.com/grpc/grpc/pull/32965

JoanFM commented 1 year ago

They seem to accept the PR, however there is an issue on grpc blocking us from updating grpc version. However, it seems confirmed that it is not an error or missbehavior

SpacedDot commented 1 year ago

Curious why this only happens on ubuntu 18.04, not 20.04 ?

SpacedDot commented 1 year ago

They seem to accept the PR, however there is an issue on grpc blocking us from updating grpc version. However, it seems confirmed that it is not an error or missbehavior

Really appreciate for the update.

JoanFM commented 1 year ago

I guess thay may have a different default option for this IPPROTO_TCP socket option,