Open martalist opened 1 year ago
Hi @martalist , do you mind giving a try in the master? we recently refactor the handle construction a little bit, this potentially is mitigated. LMK if this issue still exists, I can take a deeper look, thank you very much for posting this question.
Hi @sihanwang41, we will test the master branch and report back asap. Thank you for making this a priority on your end!
Using the Linux Python 3.10 (x86_64) build, there appears to be an exception raised when using ray timeline
:
[2023-05-26 09:30:18] INFO ray.scripts.scripts::Connecting to Ray instance at 172.24.0.11:6379.
[2023-05-26 09:30:18] INFO ray._private.worker::Connecting to existing Ray cluster at address: 172.24.0.11:6379...
[2023-05-26 09:30:18] INFO ray._private.worker::Connected to Ray cluster. View the dashboard at 172.24.0.11:8053
Traceback (most recent call last):
File "/repo/.venv/bin/ray", line 8, in <module>
sys.exit(main())
File "/repo/.venv/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2462, in main
return cli()
File "/repo/.venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/repo/.venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/repo/.venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/repo/.venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/repo/.venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/repo/.venv/lib/python3.10/site-packages/ray/scripts/scripts.py", line 1827, in timeline
ray.timeline(filename=filename)
File "/repo/.venv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/repo/.venv/lib/python3.10/site-packages/ray/_private/state.py", line 850, in timeline
return state.chrome_tracing_dump(filename=filename)
File "/repo/.venv/lib/python3.10/site-packages/ray/_private/state.py", line 446, in chrome_tracing_dump
profile_events = self.profile_events()
File "/repo/.venv/lib/python3.10/site-packages/ray/_private/state.py", line 218, in profile_events
event = common_pb2.TaskEvents.FromString(task_events[i])
AttributeError: module 'ray.core.generated.common_pb2' has no attribute 'TaskEvents'
Same result when using Ray wheels master/ce16a2e82feb475f09e069905da71933a3e90654.
Unfortunately, this blocks me from testing further.
@sihanwang41 I have tested with 2.6.3
. Noticing that half of the sequential get_deployment_info
calls have been replaced with parallel get_num_ongoing_requests
calls. Overall, for the above example, latency remains roughly the same due to (what I presume is) added overhead from handle_request_streaming
.
There does not seem to be any way to parallelise the remaining sequential get_deployment_info
calls client side. Can this be optimised within Ray Serve?
What happened + What you expected to happen
Nested
@serve.deployments
incur sequential overhead when called by their parent; for each child,ServeController.get_deployment_info
is called twice. This harms concurrency and overall latency for requests.In my own application I have observed parent
handle_request
latency being up to 8 times the (concurrent) child latency (in dashboard/metrics). See the example timeline below, showing the sequentialget_deployment_info
calls:Given the
Router.infer
method below is written for concurrency, I'd expect the serve framework to handle background tasks concurrently, too. Having inference latency so much higher for the parent (than children) is blocker for my application. Particularly as adding more child workers equates to more preprocessing latency.Versions / Dependencies
Ubuntu Jammy, ray 2.4.0
Reproduction script
Executed with
RAY_PROFILING=1 serve run issue:router
, requests made with your favourite HTTP lib, and timeline captured withray timeline
.Issue Severity
High: It blocks me from completing my task.