I am currently running Qwen2.5-72b-instruct on a DGX PCIE server with VLLM as the inference engine.
Inspired by the ideas of Noam Brown on how they have reached the o1 idea of scaling the inference time, been wondering whether its possible to manually increase the inference time the Qwen2.5 that im running on my server.
Thanks in advance to all the kind community members of VLLM.
:D
To infinite(y) and beyond compute!
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Your current environment
How would you like to use vllm
I am currently running Qwen2.5-72b-instruct on a DGX PCIE server with VLLM as the inference engine. Inspired by the ideas of Noam Brown on how they have reached the o1 idea of scaling the inference time, been wondering whether its possible to manually increase the inference time the Qwen2.5 that im running on my server. Thanks in advance to all the kind community members of VLLM. :D To infinite(y) and beyond compute!
Before submitting a new issue...