Open TNT3530 opened 2 months ago
@TNT3530 Thank you for reporting this. Looks like the q_norm
/k_norm
is yet to be supported.
@tlopex Hi Shushi, I wonder if you have bandwidth to look into this? We already have the Cohere model in https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/model/cohere/cohere_model.py but it doesn't support the q_norm
and k_norm
in attention when a config value use_qk_norm
is true. So likely it's only the matter of supporting this.
Reference:
@MasterJH5574 Okay! I'll try to support this later.
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Should load and quantize the weights like normal
Environment
conda
, source): pippip
, source): pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context
Normal Command-R works, only Plus has the warning