Open dioptre opened 9 months ago
I think this is related to a ticket I filled recently https://github.com/ray-project/ray/issues/42392
To my mind, if Ray Serve supported any ASGI-compliant web server, that would open the door for many optimizations at the request-handling level.
I think it would be nice to have support for WSGI, ASGI and unique interfaces like socketify do: https://docs.socketify.dev/cli.html
There's reasons to not use the event loop, so not sure I'd push ASGI alone.
Any updates on this? 🙏
@dioptre Nope, we do not have any near-term plan for offering this. But you are always welcome to contribute to the codebase🙂
I'm attempting to adapt socketify.py to Ray, let's see if it works first then we can start working on the abstraction layer. But in general the speed up might not be what you expect it to be, the bottleneck is not at the framework level.
Legend, let me know if you need a hand
I tried to directly replace uvicorn with socketify's asgi class + fastapi and performance was boosted by ~10%, I guess the majority of perf gain would come when I replace the actual top framework
I think replacing uvicorn may boost the performance a little bit, but the bottleneck is actually inside the request handler.
For every request, the proxy makes a couple of RPC calls to the deployment replicas to check availability and how many requests are in the queue for low balancing. See #46693. TLDR, the proxy is at least an order of magnitude slower than uvicorn, so I wouldn't expect crazy gains from replacing that part. (maybe I'm wrong...)
I wonder if an intelligent cache of what queries end up where at the front would fix it and allow for more throughput. If https://github.com/ray-project/ray/issues/46693 is indeed true, that means ray serve is indeed 500x slower than socketify. That's horrifying.
Description
Serve is hardwired to FastAPI.
FastAPI is 50x (not 50%) slower than other implementations like Socketify. https://github.com/cirospaciari/socketify.py
It's a matter of time before fastapi will get replaced by faster frameworks. Please abstract the proxy away so we can replace it.
Use case
Any non-trivial sites that take a lot of load will appreciate this, saving more electricity and $ on machines, and saving the environment. The polar bears will love you.