ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
371 stars 18 forks source link

Allow loading a model using relative path #64

Closed felladrin closed 3 months ago

felladrin commented 3 months ago

This makes Wllama correctly load the model from relative paths.

Tests

The following 4 ways of loading a model were tested:

wllama.loadModelFromUrl("models/Qwen1.5-0.5B-Chat.Q4_k_m.shard-00001-of-00003.gguf")

wllama.loadModelFromUrl("/models/Qwen1.5-0.5B-Chat.Q4_k_m.shard-00001-of-00003.gguf")

wllama.loadModelFromUrl("./models/Qwen1.5-0.5B-Chat.Q4_k_m.shard-00001-of-00003.gguf")

wllama.loadModelFromUrl("https://huggingface.co/Felladrin/gguf-sharded-Qwen1.5-0.5B-Chat/resolve/main/Qwen1.5-0.5B-Chat.Q4_k_m.shard-00001-of-00003.gguf")

It works for both non-sharded and sharded models. For example, this non-sharded model was also tested:

wllama.loadModelFromUrl("models/stories15M-q4_0.gguf")

wllama.loadModelFromUrl("/models/stories15M-q4_0.gguf")

wllama.loadModelFromUrl("./models/stories15M-q4_0.gguf")

wllama.loadModelFromUrl("https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories15M-q4_0.gguf")

Screenshots

image image image image

About typings

The type RequestInfo | URL comes from the argument from the browser's fetch().

image
ngxson commented 3 months ago

LGTM. Thank you!