Closed zcin closed 2 months ago
Talked to @edoakes > no leads... @kevin85421 is going to go into Ray Serve source ... this is interrupt important ... and we don't want to revert the DAG PR for this either... so no choice have to investigate.
Suspicious PR is #45699
Found that the slowness is from Router._resolve_deployment_responses
which is basically pickle.dump
and pickle.load
. It's unclear how https://github.com/ray-project/ray/pull/45699 affects it since if we just run handle 1mb
, it's the same before and after that PR. It's only slower if we run handle noop
before it.
Instead of figuring out why pickle.dump
is slower, we decided to remove the call of Router._resolve_deployment_responses` all together since it turns out to contribute a large portion of the latency.
Assigning back to @zcin for tracking the removal of Router._resolve_deployment_responses
What happened + What you expected to happen
Around 6/14, latency for sending a request with 1MB payload through a serve
DeploymentHandle
increased from ~3.4s to ~4.6s.From bisecting, https://github.com/ray-project/ray/commit/d729815c4b88232dcb20860ff5ee1e7f871111f4 seems to be the offending commit.
Versions / Dependencies
n/a
Reproduction script
Run
python release/serve_tests/workloads/microbenchmarks.py
.Issue Severity
None