Closed yukiarimo closed 3 months ago
How web browsers handle RAM management
It could be for WASM, cause I tried some llama-cpp built (current one), but my maximum was around 300 MB.
Would be nice if you could help me with building it or maybe some WebGPI solution
hmmm might be a limitation of wasm itself... I could help
quick thought: -use webgpi to load the model into ram (the base model only) -somehow get llama-cpp-web to use it as the model (maybe split it up) -make a virtual interface to get offline to work with it (send prompt to llama-cpp-wasm)
Somehow get llama-cpp-web to use it as the model
Are you referring to the llama-cpp-wasm? What do you mean by split it up?
Make a virtual interface
Yeah, I'm already working on it (check the offline.js
file)
Use webgpi to load the model into ram
How? All the build files included (and simplified) in Yuna are from this repo: https://github.com/tangledgroup/llama-cpp-wasm. It's not using WebGPU, so I have no idea. (There's also a llm.js
, but I was unable to extract its core)
Note:
- If you're using Safari like me, enable WebGPU in the settings in the Feature Flags developer section
I'm using Firefox since chromium doesn't work
and I just realized the base model is 4.5gb 🤦♂️ that rules out loading the model into ram... we're stuck: we could make it so users download the model and then add it to the website and then somehow get llama to read it... would be annoying, though it would allow a html version to be released for serverless use, but how we do that when llama-cpp for wasm doesn't work properly is my question
I just realized the base model is 4.5GB
I could make it to the ~3.2 GB to fit the WASM limit of 4 GB!
it might work then... I'll try it
Sure! Let me know if you'll figure it out. I'll try to modify stuff on the HF today, and you can probably get a more quantized model before tomorrow morning 👍🏻
sure... I'll try getting one
Updated model link: https://huggingface.co/yukiarimo/yuna-ai-v1
I'll get back to you once I get home... I'm at school rn and they blocked huggingface
Sure thing! Lol, is ChatGPT or Perplexity also blocked in your school?
yep... classified as ai
I've been home for awhile so let me get the model... I'll test it
Sure, you can grab any (better q5) from the HF repo above. By the way, I'm also starting training V2 in a few days, so keep updated (150k+ tokens)!
ok
im using the light model... but i do want to see if the heavy version works too
Are you doing a light model in WASM? Where? Which model?
Are you doing a light model in WASM? Where? Which model?
the model is "yuna-ai-v1-q3_k_m.gguf" in the yuna folder
Is it working in WASM? How exactly did you try?
I tried enabling the setting... and then nothing happened (probably since I was using the pi to try and chat)
I'll try my phone today
Sure. And don’t forget to check the console for logs! (because I was lazy to implement popup-errors)
Everyone! I think, I'll convert this issue into a discussion!
might be something to due with how web browser handle RAM management