pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.14k stars 835 forks source link

Websocket Support #3252

Closed tiefucai closed 1 month ago

tiefucai commented 1 month ago

🚀 The feature

I saw ISSUE#2447 asking for websocket support and that issue has been tagged as enhancement. I wondering if TorchServe support websocket now?

Motivation, pitch

Using Torch models for ASR (as an example) will require to send many http request with chunks of audio (for real time streaming). With websockets, the connection would be persistent, chunks would be processed and replies are sent to the client

Alternatives

No response

Additional context

No response

agunapal commented 1 month ago

Hi @tiefucai We do have a streaming API which we have tested only with LLM. https://github.com/pytorch/serve/blob/96450b9d0ab2a7290221f0e07aea5fda8a83efaf/docs/inference_api.md#curl-example-1

Could you please try this with ASR and if it works, can you please send a PR

mreso commented 1 month ago

Hi @tiefucai for your use case the streaming API with stateful inference should be applicable as you want to send a stream/chunks of data to the same model. See this example for details and let me know if you need any help with this. The recent SAM2 demo on meta.ai used this for their video segmentation.

mreso commented 1 month ago

Will close this issue as we have support for input streams of data to the same model. Feel free to reopen or join our Slack channel for further questions.