runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
213 stars 81 forks source link

Update documentation to note support for extra parameters #69

Open bryankruman opened 3 months ago

bryankruman commented 3 months ago

Greetings!

I just wanted to make a quick note that the documentation for worker-vllm and RunPod both don't seem to mention anything about vLLM supporting guided generation via Json schemas or Regex/grammar patterns, but it DOES in fact support it as vLLM itself supports it.

It's a great feature and more people should consider using it for sure. In case you're not familiar, check out the vLLM docs for details about the "extra" parameters on the OpenAI completions endpoints:

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters-for-chat-api

nerdylive123 commented 1 month ago

Yeah worth documenting this on the usage examples maybe