Open dongzeli95 opened 8 months ago
So upon load testing, I found that if I have 30 concurrent streaming recognize sessions, it would create problem where all threads stuck. I have tried to use both gevent greenlet and native thread pool, nothing helps with this limit.
On the other hand, concurrency under 25 seems to work fine. I've also seen this post arguing about grpc max concurrency settings. I assume we are using grpc underneath this speech recognition package. Is there a way we can increase concurrency on speech client?
Could someone help take a look? Thank you so much!
Hi @dongzeli95 ,
Thanks for reporting this issue. This is potentially related to https://github.com/grpc/grpc/issues/36265, https://github.com/googleapis/python-bigtable/issues/949 and https://github.com/googleapis/google-cloud-python/issues/12423. To confirm, can you try downgrading to grpcio==1.58.0
?
@parthea Just confirmed that changing dependency onto grpcio==1.58.0
didn't help with this issue. Hope that helps.
I have seen a similar issue a number of times on my development server, which we believe might be related to deadlocks that occurred on production too. When I have the gevent monitor thread running I get this:
+--- <Greenlet "Greenlet-1" at 0x7f468274ec00: spawn_greenlets>
: Parent: <Hub '' at 0x7f468eaa4720 epoll default pending=0 ref=3 fileno=8 resolver=<gevent.resolver.thread.Resolver at 0x7f46831ad5d0 pool=<ThreadPool at 0x7f468e827610 tasks=1 size=1 maxsize=10 hub=<Hub at 0x7f468eaa4720 thread_ident=0x7f469d874740>>> threadpool=<ThreadPool at 0x7f468e827610 tasks=1 size=1 maxsize=10 hub=<Hub at 0x7f468eaa4720 thread_ident=0x7f469d874740>> thread_ident=0x7f469d874740>
: Spawned at:
: File ".../app/util/telemetry/cloud_trace.py", line 85, in batch_write_spans
: name = self.client.common_project_path(app_id)
: File ".../lib/python3.10/functools.py", line 981, in __get__
: val = self.func(instance)
: File ".../app/util/telemetry/cloud_trace.py", line 61, in client
: return google.cloud.trace.TraceServiceClient(credentials=self.credentials)
: File "/home/embray/src/talque/talque/tools/talque3/lib/python3.10/site-packages/google/cloud/trace_v2/services/trace_service/client.py", line 640, in __init__
: self._transport = transport_init(
: File ".../lib/python3.10/site-packages/google/cloud/trace_v2/services/trace_service/transports/grpc.py", line 174, in __init__
: self._grpc_channel = channel_init(
: File ".../lib/python3.10/site-packages/google/cloud/trace_v2/services/trace_service/transports/grpc.py", line 229, in create_channel
: return grpc_helpers.create_channel(
: File ".../lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 386, in create_channel
: return grpc.secure_channel(
: File ".../lib/python3.10/site-packages/grpc/__init__.py", line 2146, in secure_channel
: return _channel.Channel(
: File ".../lib/python3.10/site-packages/grpc/_channel.py", line 2084, in __init__
: cygrpc.gevent_increment_channel_count()
: File ".../lib/python3.10/site-packages/gevent/pool.py", line 392, in spawn
: greenlet = self.greenlet_class(*args, **kwargs)
A common thread I think I've seen in most of the related issues to this is just some Google API that's using gRPC for transport, called from a concurrent.futures.ThreadPoolExecutor
(which, with gevent patching at least, is replaced with greenlets, though I've seen some similar reports that didn't mention gevent at all).
In my case it happens to be coming from the Trace API, but others have reported a similar problem against other APIs, including this one.
I tried a reproduction similar to https://github.com/googleapis/python-bigtable/issues/949#issuecomment-2040332357 but couldn't get the problem to happen so far that way, even though it bears similarity to a simplified version of what's going on in the production server.
I have grpc==1.68.0 for what it's worth.
This is my code for running each recognize task in a separate gevent greenlet and this causes the greenlet to stuck:
I've also included patch for my gevent application:
So after the above greenlet stuck, my whole application will basically hang.
Note that if I do the stream recognition synchronously without using gevent greenlet, it works fine. However I still prefer to use it in separate thread to improve on latency. I wonder if this is also another grpc incompatibility issue with gevent.