Inconsistent and unreliable outputs on mobile as opposed to on pc/laptop for -1k models

Hello! I would like to know if there are any unseen errors or limitations when prompting a model on mobile compared to a PC/laptop.

Specifically, we are testing a RAG system where we provide the model with context and ask it to generate a response based on that context. Our goal is to list profiles of selected lawyers and a user-supplied legal issue, then ask the model to justify why these lawyers are suitable for the user's legal problem.

We tested several smaller models optimized for mobile (e.g., Phi 1.5/2/3, RedPajama, Tinyllama). These models work well with simple prompts (e.g., "List three states in the USA"). However, with more complex prompts (like the one described above), PCs/laptops provide coherent responses, while mobile devices produce gibberish, even with the same prompts and settings. Is it perhaps something with the Requested maxStorageBufferBindingSize exceeding 128MB?

Below is an example using the RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k model, showing the model's response on different platforms to the same prompt as well as the prompt and code itself. As you can see the response on PC/laptop is much better and what we are hoping to achieve.

I would appreciate any advice on improving the output on mobile! Thank you :)

Output on windows

Output on android

Screenshot of code and prompt

@customautosys

mlc-ai / web-llm

Inconsistent and unreliable outputs on mobile as opposed to on pc/laptop for -1k models #485