Model request: Phi 3 mini 128K

flatsiedatsie commented 1 month ago

Seems like a good match for WebLLM, as it was practicaly designed to run in the browser.

From this reddit thread: https://www.reddit.com/r/LocalLLaMA/comments/1d2o445/comment/l63cvxk/

CharlieFRuan commented 1 month ago

Phi3-mini, StableLM 1.6B, Qwen 1.8B were just added to the prebuilt list here: https://github.com/mlc-ai/web-llm/pull/433

Will bump the version to 0.2.39 soon.

Note the phi3 we added was 4k instead of 128K.

If I understand correctly, to support 128K context length, we need to allocate a KV cache that is 128K on the sequence dimension, which yields to head_dim * num_layer * num_kv_heads * {k,v} * size(f16) * 128K bytes, i.e. 96 * 32 * 32 * 2 * 2 * 128000, which is 46GB, as opposed to 1.5GB for 4k context length.

CharlieFRuan commented 1 month ago

Just published 0.2.39; those models are now included in the prebuilt app config!

flatsiedatsie commented 1 month ago

Very nice.

If you don't mind I'll keep this open for now? I think the 128K context version would still offer something valueable to WebLLM.

mlc-ai / web-llm

Model request: Phi 3 mini 128K #432