ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
441 stars 21 forks source link

add Blob support + OPFS + load from local file(s) #52

Closed ngxson closed 5 months ago

ngxson commented 6 months ago

Resolves #42 Resolves #43

flatsiedatsie commented 6 months ago

Super cool.

I'm curious: how does this deal with memory/chunking? Can I feed it a files list for example? Or is this for single .gguf files only?

ngxson commented 6 months ago

@flatsiedatsie Yes, this is made so that we can feed in a list of gguf shards. Even better, Blob should allow us to load a big file from local hard drive without splitting it into shards.

Blob in js works a bit like file descriptor in low-level languages. It allows the program to read the file chunk by chunk (so it uses less RAM)

In theory, we could do the same with a remote file (i.e. loadModelFromUrl), but that requires some major changes will be quite messy. For now, let's just focus on loading a local file.

felladrin commented 6 months ago

read the file chunk by chunk (so it uses less RAM)

In theory, we could do the same with a remote file (i.e. loadModelFromUrl)

Would love to see it in Wllama! That's exactly how the loading of GGUF files was implemented in https://github.com/huggingface/ratchet

flatsiedatsie commented 6 months ago

In theory, we could do the same with a remote file

Would love to see it in Wllama!

That would be very nice indeed

flatsiedatsie commented 6 months ago

This post on Reddit revived an old idea:

https://www.reddit.com/r/LocalLLaMA/comments/1cwjc4n/hugging_face_adds_an_option_to_directly_launch/

Screenshot 2024-05-21 at 12 47 37

The project I'm working on currently has an option to share a URL with someone else. That URL contains two things:

Screenshot 2024-05-21 at 12 36 17

This makes it easy to, for example, share a fun prompt for generating recipes, images, or music. (And I'm sure it will never be abused..)

My project also allows users to use their own custom AI by providing the URL of a .gguf (or an array of .gguf shards). Wllama then runs that.

Screenshot 2024-05-21 at 12 40 21

It would be very easy to make it so that the shareable link also contains the URL of a .guff file. In fact, for the custom AI's that's the only thing that makes sense.

Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.

Currently this would only be easy to do for models below 2GB in size, as those only require a user to pick a single .gguf on HuggingFace.

ngxson commented 6 months ago

Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.

Great idea. I'm imagining an extension from that: An UI that you can select from a dropdown a list of prompts, maybe the list can be saved inside a config file in HF's repo. Probably this will be interesting for @julien-c

ngxson commented 6 months ago

@felladrin @flatsiedatsie Small update on this PR (and globally on the project): I recently have quite a lot of things to do in my life, so the progress is currently slow down a bit. But luckily, this PR is making some substantial progress. The only thing missing is support for OPFS cache on Safari due to webkit bug

And a bit spoiler, I'm leaving cybersecurity and reorient myself to focus on machine learning. By september, when I move to the new job, I'll have more time dedicated to this project.

flatsiedatsie commented 6 months ago

It's all good! Wllama is now in a very useable state anyway. It's a core pillar of the project I will release this month.

Good luck on the switch! You definitely have the skills for it :-)

felladrin commented 5 months ago

That's great news!! You've done a lot and Wllama is in great shape! Take your time and all the best with the career move!

ngxson commented 5 months ago

Good news everyone! This PR is now ready for testing. I've been able to add a small demo under examples/basic that allow user to pick a local gguf file (or multiple files)

image

It is however, quite annoying to me because we still can't load files bigger than 2GB, due to ftell/fseek being limited to MAX_LONG, which is 2³¹-1 bytes. Basically this is even worst than the old day FAT32 since biggest file FAT32 can handle is 4GB (which is equivalent to MAX_UNSIGNED_LONG).

That aside, I'll merge this PR in the next days, so feel free to test it out. Thank you!

felladrin commented 5 months ago

Just tested it! And it's all working fine, except for one thing:

It's not caching the models anymore on iOS browsers (tested with Mobile Safari and Mobile Brave)

On both, I'm getting the message NOT using cache for (...).

image

So every time the page is refreshed, the model is fully downloaded again.

Notes:

ngxson commented 5 months ago

@felladrin I updated a small fix for Safari iOS. It kinda work for me, but a bit unstable (as usual, I think)

I didn't test with model bigger than 60MB, so please give git a try. Thank you!

felladrin commented 5 months ago

Excellent 🙌 On iOS, it's now loading from the cache correctly!

image