add Blob support + OPFS + load from local file(s)

ngxson commented 6 months ago

Resolves #42 Resolves #43

loadModel() now also accepts Blob or File
added GGUFRemoteBlob that can stream Blob from a remote URL
added example for loading local gguf files
implement OPFS for cache

flatsiedatsie commented 6 months ago

Super cool.

I'm curious: how does this deal with memory/chunking? Can I feed it a files list for example? Or is this for single .gguf files only?

ngxson commented 6 months ago

@flatsiedatsie Yes, this is made so that we can feed in a list of gguf shards. Even better, Blob should allow us to load a big file from local hard drive without splitting it into shards.

Blob in js works a bit like file descriptor in low-level languages. It allows the program to read the file chunk by chunk (so it uses less RAM)

In theory, we could do the same with a remote file (i.e. loadModelFromUrl), but that requires some major changes will be quite messy. For now, let's just focus on loading a local file.

felladrin commented 6 months ago

read the file chunk by chunk (so it uses less RAM)

In theory, we could do the same with a remote file (i.e. loadModelFromUrl)

Would love to see it in Wllama! That's exactly how the loading of GGUF files was implemented in https://github.com/huggingface/ratchet

flatsiedatsie commented 6 months ago

In theory, we could do the same with a remote file

Would love to see it in Wllama!

That would be very nice indeed

flatsiedatsie commented 6 months ago

This post on Reddit revived an old idea:

https://www.reddit.com/r/LocalLLaMA/comments/1cwjc4n/hugging_face_adds_an_option_to_directly_launch/

Screenshot 2024-05-21 at 12 47 37

The project I'm working on currently has an option to share a URL with someone else. That URL contains two things:

A prompt
The model that should be used to run that prompt.

This makes it easy to, for example, share a fun prompt for generating recipes, images, or music. (And I'm sure it will never be abused..)

My project also allows users to use their own custom AI by providing the URL of a .gguf (or an array of .gguf shards). Wllama then runs that.

It would be very easy to make it so that the shareable link also contains the URL of a .guff file. In fact, for the custom AI's that's the only thing that makes sense.

Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.

Currently this would only be easy to do for models below 2GB in size, as those only require a user to pick a single .gguf on HuggingFace.

ngxson commented 6 months ago

Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.

Great idea. I'm imagining an extension from that: An UI that you can select from a dropdown a list of prompts, maybe the list can be saved inside a config file in HF's repo. Probably this will be interesting for @julien-c

ngxson commented 6 months ago

@felladrin @flatsiedatsie Small update on this PR (and globally on the project): I recently have quite a lot of things to do in my life, so the progress is currently slow down a bit. But luckily, this PR is making some substantial progress. The only thing missing is support for OPFS cache on Safari due to webkit bug

And a bit spoiler, I'm leaving cybersecurity and reorient myself to focus on machine learning. By september, when I move to the new job, I'll have more time dedicated to this project.

flatsiedatsie commented 6 months ago

It's all good! Wllama is now in a very useable state anyway. It's a core pillar of the project I will release this month.

Good luck on the switch! You definitely have the skills for it :-)

felladrin commented 5 months ago

That's great news!! You've done a lot and Wllama is in great shape! Take your time and all the best with the career move!

ngxson commented 5 months ago

Good news everyone! This PR is now ready for testing. I've been able to add a small demo under examples/basic that allow user to pick a local gguf file (or multiple files)

It is however, quite annoying to me because we still can't load files bigger than 2GB, due to ftell/fseek being limited to MAX_LONG, which is 2³¹-1 bytes. Basically this is even worst than the old day FAT32 since biggest file FAT32 can handle is 4GB (which is equivalent to MAX_UNSIGNED_LONG).

That aside, I'll merge this PR in the next days, so feel free to test it out. Thank you!

felladrin commented 5 months ago

Just tested it! And it's all working fine, except for one thing:

It's not caching the models anymore on iOS browsers (tested with Mobile Safari and Mobile Brave)

On both, I'm getting the message NOT using cache for (...).

So every time the page is refreshed, the model is fully downloaded again.

Notes:

On iOS, the inference is running fine, after the model is downloaded. The problem is just that it's not being cached.
- I double-checked that using v1.9.0 in this same local environment, the caching is working for iOS, which indicates it's an issue introduced in this PR.
On desktop browsers, everything is working fine.
- I also tested loading files from the local filesystem and it worked like a charm.

ngxson commented 5 months ago

@felladrin I updated a small fix for Safari iOS. It kinda work for me, but a bit unstable (as usual, I think)

I didn't test with model bigger than 60MB, so please give git a try. Thank you!

felladrin commented 5 months ago

Excellent 🙌 On iOS, it's now loading from the cache correctly!

ngxson / wllama

add Blob support + OPFS + load from local file(s) #52