[Bug] Unused External Parameters when quantizing Command-R-Plus

TNT3530 commented 2 months ago

🐛 Bug

Steps to reproduce the behavior:

Should load and quantize the weights like normal

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): L40
How you installed MLC-LLM (conda, source): pip
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10.12
GPU driver version (if applicable): Unknown
CUDA/cuDNN version (if applicable): CUDA 12.1
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):

Normal Command-R works, only Plus has the warning

MasterJH5574 commented 2 months ago

@TNT3530 Thank you for reporting this. Looks like the q_norm/k_norm is yet to be supported.

MasterJH5574 commented 2 months ago

@tlopex Hi Shushi, I wonder if you have bandwidth to look into this? We already have the Cohere model in https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/model/cohere/cohere_model.py but it doesn't support the q_norm and k_norm in attention when a config value use_qk_norm is true. So likely it's only the matter of supporting this.

Reference:

tlopex commented 2 months ago

@MasterJH5574 Okay! I'll try to support this later.