When starting an OpenAI-compatible server, provide a specific set of sampling parameters to override the default parameters provided by vLLM.
Some model publishers (e.g., Qwen2.5) provide a set of parameters optimized for their model. Setting these parameters manually in the client for every request can be cumbersome, and forgetting to do so may result in incoherent outputs.
Do the maintainers of vLLM consider it necessary to allow registering such parameters along with the model, so they can override the default sampling parameters?
If deemed necessary, I am open to developing this feature and submitting a pull request.
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Providing these parameters in generation_config seems to be a non-standard approach, making it impractical to parse this file for specific sampling parameters.
🚀 The feature, motivation and pitch
When starting an OpenAI-compatible server, provide a specific set of sampling parameters to override the default parameters provided by vLLM.
Some model publishers (e.g., Qwen2.5) provide a set of parameters optimized for their model. Setting these parameters manually in the client for every request can be cumbersome, and forgetting to do so may result in incoherent outputs.
Do the maintainers of vLLM consider it necessary to allow registering such parameters along with the model, so they can override the default sampling parameters?
If deemed necessary, I am open to developing this feature and submitting a pull request.
Alternatives
No response
Additional context
https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/generation_config.json
Before submitting a new issue...