T5 and Flan-T5 models support (llama_encode)

ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

https://huggingface.co/spaces/ngxson/wllama

MIT License

444 stars 23 forks source link

T5 and Flan-T5 models support (llama_encode) #86

Closed felladrin closed 4 months ago

felladrin commented 4 months ago

T5 and Flan-T5 models support has just been merged into llama.cpp 🎉

I tried updating llama.cpp in a local copy of wllama, compiling it and loading https://huggingface.co/Felladrin/gguf-LaMini-Flan-T5-248M/resolve/main/LaMini-Flan-T5-248M.Q6_K.gguf. It loads fine, but when I click to run it throws an error:

I also tried editing the example/index.html directly so it loads from HF URL, but it had the same problem.

Sharing it here in case someone has any ideas.

ngxson commented 4 months ago

Currently wllama only use llama_decode, but T5 introduces a new API llama_encode which we haven't yet implemented (because T5 is encoder-decoder architecture). This will be added in the next version.