pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.21k stars 859 forks source link

[Docs] More information regarding text generation & LLM inference #2564

Open jaywonchung opened 1 year ago

jaywonchung commented 1 year ago

📚 The doc issue

I am new to TorchServe and was looking for some features that I need to be able to consider using TorchServe for LLM text generation.

Today, there are a couple inference serving solutions out there, including text-generation-inference and vLLM. It would be great if the documentation can mention how TorchServe compares with these at the moment. For instance,

Suggest a potential alternative/fix

A dedicated page for text generation and LLM inference could make sense given that there would be a lot of people interested in this.

agunapal commented 1 year ago

Thanks for your questions @jaywonchung . We are working on extending/improving our documentation for LLMs.

You can find an example with llama2 here https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_accelerate/llama2