Closed Neet-Nestor closed 4 months ago
hmm no.. in the example there is a list of models... I wish to be able to upload a model from alocal directory without downloading it from the internet.
for example.. let's say I wish to have Mistral Instruct v0.3 quantized as: f16 (output and embed) and q6_k for the other tensors. How should I proceed?
@0wwafa I understand the need here. Let me explain.
First, the prerequisite for custom models to run on WebLLM chat is that the models must be compiled to MLC format. For more details, checking the instructions of mlc llm here.
Once you got the the MLC-format models on your local, the proposal here is to allow one of the three following ways to use it on the webapp:
These are planned to be released in the next months. Does any of these fulfill what you need?
Welll I just wish to see how mistral works in the web browser using one of my quantizations, specifically: f16 / q6, f16 /q5 and q8/q6 and q8 q5.. https://huggingface.co/ZeroWw/Test
In other words I quantized the output and embed tensors to f16 (or q8) and the other tensors to q6 or q5. This keeps the "understanding" and "expressing" to an almost lossless quantization (f16) while it quantizes in a "good" way the other tensors. The results in my test confirm that the model in this way is less degraded and works almost as the original. I could not see any difference during interactive inference...
The app has updated to support custom models through MLC-LLM REST APIs by switching model type in settings.
https://github.com/mlc-ai/web-llm-chat/commit/2fb025c3f999cf90c1b2cd38452f0e6fc5e49e63
@0wwafa Could I know whether the update above fulfills your use case through hosting your models through mlc_llm serve
command of MLC_LLM?
My models are available here. I still don't understand how to use them with mlc_llm
Problem Description
https://github.com/mlc-ai/web-llm/issues/421
Users want to be able to upload their own models from local machine.
Solution Description
WebLLM Engine is capable of loading any MLC format models.
https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload is an example of supporting local model in the app.
We want to do something similar to allow uploading.