Output error: ValueError: The linear dimension 16384 has 409 groups under group size 40. The groups cannot be evenly distributed on 2 GPUs.
Possible solutions: reduce the number of GPUs, or use quantization with a smaller group size.
Is it possible to run a 3-bit version of the MLC-LLM model using multiple GPUs?
Hi @shahizat, as the error message has suggested, under 3-bit quantization we cannot divide groups evenly by half and thus for this case it is not supported.
Grettings to all
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
Output error: ValueError: The linear dimension 16384 has 409 groups under group size 40. The groups cannot be evenly distributed on 2 GPUs. Possible solutions: reduce the number of GPUs, or use quantization with a smaller group size.
Is it possible to run a 3-bit version of the MLC-LLM model using multiple GPUs?
Thanks in advance!