mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
11.84k stars 743 forks source link

Inconsistent and unreliable outputs on mobile as opposed to on pc/laptop for -1k models #485

Open JohnReginaldShutler opened 1 month ago

JohnReginaldShutler commented 1 month ago

Hello! I would like to know if there are any unseen errors or limitations when prompting a model on mobile compared to a PC/laptop.

Specifically, we are testing a RAG system where we provide the model with context and ask it to generate a response based on that context. Our goal is to list profiles of selected lawyers and a user-supplied legal issue, then ask the model to justify why these lawyers are suitable for the user's legal problem.

We tested several smaller models optimized for mobile (e.g., Phi 1.5/2/3, RedPajama, Tinyllama). These models work well with simple prompts (e.g., "List three states in the USA"). However, with more complex prompts (like the one described above), PCs/laptops provide coherent responses, while mobile devices produce gibberish, even with the same prompts and settings. Is it perhaps something with the Requested maxStorageBufferBindingSize exceeding 128MB?

Below is an example using the RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k model, showing the model's response on different platforms to the same prompt as well as the prompt and code itself. As you can see the response on PC/laptop is much better and what we are hoping to achieve.

I would appreciate any advice on improving the output on mobile! Thank you :)

  1. Output on windows

image

  1. Output on android

image

  1. Screenshot of code and prompt

image

@customautosys

JohnReginaldShutler commented 3 weeks ago

Hello just pinging the above issue for any response! :)