[SERVE] Provide an abstraction for proxy

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.76k stars 5.74k forks source link

[SERVE] Provide an abstraction for proxy #42882

Open dioptre opened 9 months ago

dioptre commented 9 months ago

Description

Serve is hardwired to FastAPI.

FastAPI is 50x (not 50%) slower than other implementations like Socketify. https://github.com/cirospaciari/socketify.py

It's a matter of time before fastapi will get replaced by faster frameworks. Please abstract the proxy away so we can replace it.

Use case

Any non-trivial sites that take a lot of load will appreciate this, saving more electricity and $ on machines, and saving the environment. The polar bears will love you.

antoniomdk commented 9 months ago

I think this is related to a ticket I filled recently https://github.com/ray-project/ray/issues/42392

To my mind, if Ray Serve supported any ASGI-compliant web server, that would open the door for many optimizations at the request-handling level.

dioptre commented 9 months ago

I think it would be nice to have support for WSGI, ASGI and unique interfaces like socketify do: https://docs.socketify.dev/cli.html

There's reasons to not use the event loop, so not sure I'd push ASGI alone.

dioptre commented 6 months ago

Any updates on this? 🙏

GeneDer commented 6 months ago

@dioptre Nope, we do not have any near-term plan for offering this. But you are always welcome to contribute to the codebase🙂

Superskyyy commented 1 week ago

I'm attempting to adapt socketify.py to Ray, let's see if it works first then we can start working on the abstraction layer. But in general the speed up might not be what you expect it to be, the bottleneck is not at the framework level.

dioptre commented 1 week ago

Legend, let me know if you need a hand

Superskyyy commented 6 days ago

I tried to directly replace uvicorn with socketify's asgi class + fastapi and performance was boosted by ~10%, I guess the majority of perf gain would come when I replace the actual top framework

antoniomdk commented 6 days ago

I think replacing uvicorn may boost the performance a little bit, but the bottleneck is actually inside the request handler.

For every request, the proxy makes a couple of RPC calls to the deployment replicas to check availability and how many requests are in the queue for low balancing. See #46693. TLDR, the proxy is at least an order of magnitude slower than uvicorn, so I wouldn't expect crazy gains from replacing that part. (maybe I'm wrong...)

dioptre commented 6 days ago

I wonder if an intelligent cache of what queries end up where at the front would fix it and allow for more throughput. If https://github.com/ray-project/ray/issues/46693 is indeed true, that means ray serve is indeed 500x slower than socketify. That's horrifying.