Open Vaibhavs10 opened 1 month ago
@Vaibhavs10 yes i actually loved that --hf-repo
feature and it was the approach i was using early on, but eventually had to comment it out (see: https://github.com/pinokiocomputer/llamanet/blob/main/llamacpp.js#L63-L65) and manually download instead.
The reason was actually exactly what you mentioned. It seems that the prebuilt binaries on the releases page are not compiled with the LLAMA_CURL=1
option, and if I wanted to get this to work using the --hf-repo
option, I would have to build the files myself on the fly, and make it work on every platform.
I did try this approach with my past project Dalai https://github.com/cocktailpeanut/dalai where I did everything programmatically, from running quantization to running cmake, all programatically. It worked for most cases but the problem is always the edge cases, where the cmake commands would fail for some reason. It was too messy to try to do all this through the library, which is why this time around I wanted to avoid it as much as possible, which is why I'm downloading the prebuilt binaries from the releases page instead.
So at the moment I don't really have an option to use the --hf-repo
because I will probably avoid running the cmake command within the library.
That said, I was made aware of huggingface.js yesterday and looking into that one. Using that would give us the same benefits, right?
EDIT: Just looked through the huggingface.js docs and the source and it looks like it doesn't do what I thought it does, seems to be designed to work in the browser with no access to the file system.
Aha! It makes sense and that's also a good feedback too! Let me see what can be done about this! 🤗
Perhaps we can upstream this change up to llama.cpp.
Hi @cocktailpeanut,
Big fan of your work! and love all that you're doing to democratise ML. Congratulations on
llamanet
, it looks rad!I saw that you are creating your own cache,
llamanet
and persisting models there (correct me if I'm wrong). We recently upstream changes tollama.cpp
which allows one to directly download and cache the models from the Hugging Face Hub (Note: for this you'd need to compile the server withLLAMA_CURL=1
)With the curl support all you'd need to do is pass
--hf-repo
&--hf-file
and the model checkpoint would automatically be downloaded and cached inLLAMA_CACHE
refThis would make it easier for people to use already cached model checkpoints and should benefit well in case we make improvements to the overall caching system too.
AFAICT, you should be able to benefit from this directly by changing this line: https://github.com/pinokiocomputer/llamanet/blob/16fc9521f97549c657f80ce51c5c7a787eac4e8d/llamacpp.js#L20
Let me know what you think! VB