Open reedest7 opened 3 weeks ago
2 48G GPUs (this is my device setting), to run the service (sh reason/llm_service/create_service_math_shepherd.sh
), with RM and LM models on separate GPU.
The 1.5B model actually runs the entire A100_80G( inference service, 4 lm_workers in 4 GPUs ).
System Info
single card 3090,video memory 24GB
Who can help?
No response
Information
Tasks
Reproduction
POLICY_MODEL=Qwen2.5-Math-1.5B-Instruct;VALUE_MODEL_NAME=math-shepherd-mistral-7b-prm。When running sh reason/llm/service/generate_service_math_sthepherd.sh, only vllmw_worker can be started and the reward model cannot be started. Error OOM reported.
Expected behavior
May I ask how much video memory is required for a single card to run two models