li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
MIT License
2.92k stars 334 forks source link

n_gpu_layers 参数? #99

Open endink opened 1 year ago

endink commented 1 year ago

不支持 n_gpu_layers 参数控制装载的层数吗?多实例环境对推理速度要求不太高的场合,哪怕每个实例少装载 4~5 层也能节省很多 GPU

Tokix commented 1 year ago

In my case it was n-gpu-layers instead of n_gpu_layers which helped me to start https://github.com/oobabooga/text-generation-webui maybe this helps. I'm running the 70B 4 bit quantization.

endink commented 1 year ago

@Tokix Thanks, but C++ is important for me 😄

CHNtentes commented 1 year ago

确实有这个需求,我的笔记本3060就差一点点显存,运行不了q4_0的chatglm2-6B

wdjwxh commented 10 months ago

需求强烈