Open jaywonchung opened 1 year ago
Thanks for your questions @jaywonchung . We are working on extending/improving our documentation for LLMs.
You can find an example with llama2 here https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_accelerate/llama2
📚 The doc issue
I am new to TorchServe and was looking for some features that I need to be able to consider using TorchServe for LLM text generation.
Today, there are a couple inference serving solutions out there, including text-generation-inference and vLLM. It would be great if the documentation can mention how TorchServe compares with these at the moment. For instance,
Suggest a potential alternative/fix
A dedicated page for text generation and LLM inference could make sense given that there would be a lot of people interested in this.