Closed bil-ash closed 4 months ago
cc. @CharlieFRuan
Bot detected the issue body's language is not English, translate it automatically.
cc. @CharlieFRuan
@CharlieFRuan Please do the needful
@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.
Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models
@bil-ash We will work on this. Meanwhile, please feel free to use MLC-LLM which supports qwen2-0.5b quantized versions and connect WebLLM Chat to its serve API as a temporary alternative solution.
Instruction: https://github.com/mlc-ai/web-llm-chat/?tab=readme-ov-file#use-custom-models
Created a PR to solve the issue. Please have a look.
The model is available on WebLLM Chat now. https://chat.webllm.ai/#/chat
Thanks for the contribution!
Problem Description
My android phone has limited RAM and so it is able to run only the Tinyllama model. However, Tinyllama provides inferior result compared to Qwen2-0.5b instruct(tested on desktop). Although, Qwen2 0.5 B has fewer params, I am unable to run it on phone because the llm-chat has only the unquantized version of Qwen2-0.5B while having the quantized version of Tinlllama.
Solution Description
Please add Qwen2-0.5B quantized versions(q4f16 anf q4f32) to the list of supported models in web-llm-chat. These two are already available in huggingface.
Alternatives Considered
No response
Additional Context
No response