[Feature]: Support for Registering Model-Specific Default Sampling Parameters

🚀 The feature, motivation and pitch

When starting an OpenAI-compatible server, provide a specific set of sampling parameters to override the default parameters provided by vLLM.

Some model publishers (e.g., Qwen2.5) provide a set of parameters optimized for their model. Setting these parameters manually in the client for every request can be cumbersome, and forgetting to do so may result in incoherent outputs.

Do the maintainers of vLLM consider it necessary to allow registering such parameters along with the model, so they can override the default sampling parameters?

If deemed necessary, I am open to developing this feature and submitting a pull request.

Alternatives

No response

Additional context

https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/generation_config.json

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

vllm-project / vllm