[Docs] More information regarding text generation & LLM inference

📚 The doc issue

I am new to TorchServe and was looking for some features that I need to be able to consider using TorchServe for LLM text generation.

Today, there are a couple inference serving solutions out there, including text-generation-inference and vLLM. It would be great if the documentation can mention how TorchServe compares with these at the moment. For instance,

Does TorchServe support continuous batching?
Does TorchServe support paged attention?
Does TorchServe support streaming generated text through its inference API?
What are some LLMs that TorchServe is known to work well with, e.g. Llama2, Falcon? Apart from the Hugging Face integration example provided.

Suggest a potential alternative/fix

A dedicated page for text generation and LLM inference could make sense given that there would be a lot of people interested in this.

pytorch / serve

[Docs] More information regarding text generation & LLM inference #2564

📚 The doc issue

Suggest a potential alternative/fix