Open Raeps opened 7 months ago
Hi, thanks for running our codes. The default vllm setting would use 95% GPU memory when creating the model ( I changed it to 98% for efficiency). Once the model is created, it should not use more GPU memory. Thus, I guess the error you got is raised when the vllm was trying to create the model. You can add gpu_memory_utilization=0.8 when creating the LocalVLLM class based on your GPU usage.
HI team, I try to run the example code on A100(40Gb) but it shwos
I wonder how to fix it?