ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
451 stars 23 forks source link

Error "llama_model_load: error loading model: illegal split file: <number>, model must be loaded with the first split" #135

Open felladrin opened 5 hours ago

felladrin commented 5 hours ago

While setting up v2.0, I've noticed it's not able to load this model:

hfRepoId: "Felladrin/gguf-Q8_0-SmolLM2-135M-Instruct",
hfFilePath: "model.shard-00001-of-00005.gguf",

[Actually, the same error is happening with all models that I split (which were all working on v1).]

It downloads the models correctly, but if triggers the following error when loading the model:

llama_model_load: error loading model: illegal split file: <number>, model must be loaded with the first split
image

Any clues?

Device info

OS: MacOS Browser: Tested on Brave, Chromium, and Safari

How to reproduce

felladrin commented 5 hours ago

Update:

I've just noticed it's actually happening with any split GGUF model, including the ones from the Demo.

I've tried loading the qwen2-1_5b-instruct-q4_k_m-(shards).gguf and got the same error.

It downloads fine:

image

But fails when trying to load it:

image
felladrin commented 5 hours ago

Update: I've confirmed the issue in these 3 browsers (Brave, Chromium, and Safari), on MacOS, in the Demo app.