Closed lynnleelhl closed 1 year ago
I am not a maintainer and I do not know the answer.
One way to test your hypothesis would be to set --gpu-memory-utilization GPU_MEMORY_UTILIZATION
parameter so that the GPU memory utilization is just below 16 GB. If that prevents the problem from happening that would imply that you're right.
close as I found the error disappeared in the newest master code
I have 2 hosts each has cpu memory 16G and gpu memory 24G, when I tried to load vicuna-13b, it get OOM, here's the error message:
the config is
does each host's cpu memory must be greater than the model size even when using distributed inference?