Closed ngxson closed 5 months ago
Super cool.
I'm curious: how does this deal with memory/chunking? Can I feed it a files list for example? Or is this for single .gguf files only?
@flatsiedatsie Yes, this is made so that we can feed in a list of gguf shards. Even better, Blob
should allow us to load a big file from local hard drive without splitting it into shards.
Blob
in js works a bit like file descriptor in low-level languages. It allows the program to read the file chunk by chunk (so it uses less RAM)
In theory, we could do the same with a remote file (i.e. loadModelFromUrl
), but that requires some major changes will be quite messy. For now, let's just focus on loading a local file.
read the file chunk by chunk (so it uses less RAM)
In theory, we could do the same with a remote file (i.e. loadModelFromUrl)
Would love to see it in Wllama! That's exactly how the loading of GGUF files was implemented in https://github.com/huggingface/ratchet
In theory, we could do the same with a remote file
Would love to see it in Wllama!
That would be very nice indeed
This post on Reddit revived an old idea:
https://www.reddit.com/r/LocalLLaMA/comments/1cwjc4n/hugging_face_adds_an_option_to_directly_launch/
The project I'm working on currently has an option to share a URL with someone else. That URL contains two things:
This makes it easy to, for example, share a fun prompt for generating recipes, images, or music. (And I'm sure it will never be abused..)
My project also allows users to use their own custom AI by providing the URL of a .gguf (or an array of .gguf shards). Wllama then runs that.
It would be very easy to make it so that the shareable link also contains the URL of a .guff file. In fact, for the custom AI's that's the only thing that makes sense.
Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.
Currently this would only be easy to do for models below 2GB in size, as those only require a user to pick a single .gguf on HuggingFace.
Then people could create and share links that would instantly let them try .gguf files + prompts. HuggingFace could even integrate such a feature into their website. "Just one click to try the model on your own device, privacy protected". Though they would probably prefer to use Transformers.js. But you get my drift.
Great idea. I'm imagining an extension from that: An UI that you can select from a dropdown a list of prompts, maybe the list can be saved inside a config file in HF's repo. Probably this will be interesting for @julien-c
@felladrin @flatsiedatsie Small update on this PR (and globally on the project): I recently have quite a lot of things to do in my life, so the progress is currently slow down a bit. But luckily, this PR is making some substantial progress. The only thing missing is support for OPFS cache on Safari due to webkit bug
And a bit spoiler, I'm leaving cybersecurity and reorient myself to focus on machine learning. By september, when I move to the new job, I'll have more time dedicated to this project.
It's all good! Wllama is now in a very useable state anyway. It's a core pillar of the project I will release this month.
Good luck on the switch! You definitely have the skills for it :-)
That's great news!! You've done a lot and Wllama is in great shape! Take your time and all the best with the career move!
Good news everyone! This PR is now ready for testing. I've been able to add a small demo under examples/basic
that allow user to pick a local gguf file (or multiple files)
It is however, quite annoying to me because we still can't load files bigger than 2GB, due to ftell/fseek being limited to MAX_LONG, which is 2³¹-1 bytes. Basically this is even worst than the old day FAT32 since biggest file FAT32 can handle is 4GB (which is equivalent to MAX_UNSIGNED_LONG).
That aside, I'll merge this PR in the next days, so feel free to test it out. Thank you!
Just tested it! And it's all working fine, except for one thing:
It's not caching the models anymore on iOS browsers (tested with Mobile Safari and Mobile Brave)
On both, I'm getting the message NOT using cache for (...)
.
So every time the page is refreshed, the model is fully downloaded again.
Notes:
@felladrin I updated a small fix for Safari iOS. It kinda work for me, but a bit unstable (as usual, I think)
I didn't test with model bigger than 60MB, so please give git a try. Thank you!
Excellent 🙌 On iOS, it's now loading from the cache correctly!
Resolves #42 Resolves #43
loadModel()
now also acceptsBlob
orFile
GGUFRemoteBlob
that can stream Blob from a remote URL