Open lixin-wei opened 2 years ago
cc @scv119 @iycheng (since we discussed backpressure as a part of scalability improvement)
Hey @lixin-wei how has the investigation gone so far?
Unfortunately I have no idea so far except waiting for gRPC community's response. sad.
We optmized the number of the call which caused OOM to work around this first.
Should we remove the back-pressure in client side given that it's not working and adding too much complexity? @rkooo567 @scv119 ?
Someone replied my question in Stackoverflow, I'll try it next Monday. https://stackoverflow.com/questions/72424145/how-to-do-server-side-backpressure-in-grpc/73255069#73255069
UPDATE: Sadly it doesn't work.
I am down to remove the feature or turn it off by default. We can start discussing how to improve the mechanism after 2.0 (along with another stability improvement like gRPC config improvement). Please let us know if you guys have any proposal @wumuzi520 @lixin-wei
Hey @iycheng let's remove this? However, we definitely need backpresure for the stability, and this can be handled as a part of GCS scalability?
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
We just found that the backpressure of gRPC server never worked.
This is because even if we don't ask for a new request in the server, the server will still keep reading data from the network in
cq_->Next(&tag, &ok)
(there are many requests pending), causing huge memory usage.https://github.com/ray-project/ray/blob/d95009a3ac44a9ee2844964b31fa25f38d083388/src/ray/rpc/grpc_server.cc#L148
I think it should be an issue about gRPC, submitted here