[Bug] Any Model with the Suffix _1 Crashes Android

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

https://llm.mlc.ai/

Apache License 2.0

18.95k stars 1.55k forks source link

[Bug] Any Model with the Suffix _1 Crashes Android #2894

Open Melgark opened 1 month ago

Melgark commented 1 month ago

🐛 Bug

I tried this on both the 23 ultra and the 24

To Reproduce

1.Using any model such as Qwen2_1_5B_q4f16_1 try to send a prompt.

I've tested many models and it seems to be models with _1 causing the issue. Can someone explain what _1 is actually doing compared to the _0 or even the _2? I know its mentioned here https://llm.mlc.ai/docs/compilation/configure_quantization.html#quantization-mode but I am new to this

This doesnt happen all the time with very short prompts such as Hi

Melgark commented 1 month ago

Looking into it, I came across this https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_llm/quantization/group_quantization.py#L37

Curious as to how the models take these .bin files and know how to transpose them into a readable model. That Way I can see if maybe its an unsupported operator? Any pointers would be helpful.