Large models are not supported in WASM

yukiarimo / yuna-ai

Your Private Companion. The future AGI takeover starts here!

https://www.yuna-ai.live/

GNU Affero General Public License v3.0

90 stars 11 forks source link

Large models are not supported in WASM #69

Closed yukiarimo closed 3 months ago

gitmanbloggus commented 3 months ago

might be something to due with how web browser handle RAM management

yukiarimo commented 3 months ago

How web browsers handle RAM management

It could be for WASM, cause I tried some llama-cpp built (current one), but my maximum was around 300 MB.

Would be nice if you could help me with building it or maybe some WebGPI solution

gitmanbloggus commented 3 months ago

hmmm might be a limitation of wasm itself... I could help

gitmanbloggus commented 3 months ago

quick thought: -use webgpi to load the model into ram (the base model only) -somehow get llama-cpp-web to use it as the model (maybe split it up) -make a virtual interface to get offline to work with it (send prompt to llama-cpp-wasm)

yukiarimo commented 3 months ago

Somehow get llama-cpp-web to use it as the model

Are you referring to the llama-cpp-wasm? What do you mean by split it up?

Make a virtual interface

Yeah, I'm already working on it (check the offline.js file)

Use webgpi to load the model into ram

How? All the build files included (and simplified) in Yuna are from this repo: https://github.com/tangledgroup/llama-cpp-wasm. It's not using WebGPU, so I have no idea. (There's also a llm.js, but I was unable to extract its core)

Note:

If you're using Safari like me, enable WebGPU in the settings in the Feature Flags developer section

gitmanbloggus commented 3 months ago

I'm using Firefox since chromium doesn't work

gitmanbloggus commented 3 months ago

and I just realized the base model is 4.5gb 🤦‍♂️ that rules out loading the model into ram... we're stuck: we could make it so users download the model and then add it to the website and then somehow get llama to read it... would be annoying, though it would allow a html version to be released for serverless use, but how we do that when llama-cpp for wasm doesn't work properly is my question

yukiarimo commented 3 months ago

I just realized the base model is 4.5GB

I could make it to the ~3.2 GB to fit the WASM limit of 4 GB!

gitmanbloggus commented 3 months ago

it might work then... I'll try it

yukiarimo commented 3 months ago

Sure! Let me know if you'll figure it out. I'll try to modify stuff on the HF today, and you can probably get a more quantized model before tomorrow morning 👍🏻

gitmanbloggus commented 3 months ago

sure... I'll try getting one

yukiarimo commented 3 months ago

Updated model link: https://huggingface.co/yukiarimo/yuna-ai-v1

gitmanbloggus commented 3 months ago

I'll get back to you once I get home... I'm at school rn and they blocked huggingface

yukiarimo commented 3 months ago

Sure thing! Lol, is ChatGPT or Perplexity also blocked in your school?

gitmanbloggus commented 3 months ago

yep... classified as ai

gitmanbloggus commented 3 months ago

I've been home for awhile so let me get the model... I'll test it

yukiarimo commented 3 months ago

Sure, you can grab any (better q5) from the HF repo above. By the way, I'm also starting training V2 in a few days, so keep updated (150k+ tokens)!

gitmanbloggus commented 3 months ago

im using the light model... but i do want to see if the heavy version works too

yukiarimo commented 3 months ago

Are you doing a light model in WASM? Where? Which model?

gitmanbloggus commented 3 months ago

Are you doing a light model in WASM? Where? Which model?

the model is "yuna-ai-v1-q3_k_m.gguf" in the yuna folder

yukiarimo commented 3 months ago

Is it working in WASM? How exactly did you try?

gitmanbloggus commented 3 months ago

I tried enabling the setting... and then nothing happened (probably since I was using the pi to try and chat)

gitmanbloggus commented 3 months ago

I'll try my phone today

yukiarimo commented 3 months ago

Sure. And don’t forget to check the console for logs! (because I was lazy to implement popup-errors)

yukiarimo commented 3 months ago

Everyone! I think, I'll convert this issue into a discussion!