marcom / Llama.jl

Julia interface to llama.cpp, a C/C++ library for running language models
MIT License
23 stars 2 forks source link

Simplify Initial Setup for LLM Newcomers Using llama.jl Package #2

Open svilupp opened 7 months ago

svilupp commented 7 months ago

Thank you for creating this wrapper! I was about to do it myself -- glad I noticed the link in the past jll PRs :)

I wanted to test an idea with you.

People can use Llama.cpp directly if they want the low-level control, but that requires knowledge of prompt templates and compiling source code etc.

What if this package would serve as a Julia-only entry to have an LLM run on your laptop? Ie, no need to install anything else, we'll create a turnkey solution for you to get started.

Ollama is awesome and super user-friendly, but it's a separate application to download and its own limitations (eg, some performance issues, ngl defaults, etc) There are many others (ooba, ..), but all are separate tools to install...

What do you think? I'm happy to draft a PR.

Objective: Enhance the onboarding experience for first-time users of LLMs with llama.jl by simplifying the initial setup process. Just: use Llama; run_server()

Proposal:

Benefits:

Feedback and suggestions are welcome.

Disclaimer: I'm an author of https://github.com/svilupp/PromptingTools.jl, so I'd leverage the API from there and deepen the integration

EDIT: I have other goals/aspirations for llama.cpp jll (eg, data extraction with the grammar), but I think we should first simplify the setup for users.

marcom commented 7 months ago

Hi @svilupp , many thanks for your PR. I am super excited that someone else is interested in this package!

This package is in a bit of an neglected state, and I have been busy with other things (thesis writing...). It is a bit late here tonight, so I'll read your comments and give you a longer answer tomorrow.

It looks like you have made lots of interesting packages and progress to making LLMs available in Julia. Maybe you want to be co-maintainer of this package as well? If yes I'll try to figure out how to enable this tomorrow.

I have bumped the version of Llama.jl to 0.2.0 with your PR included. Unfortunately there were some errors when regenerating the bindings, the unfinished PR is in #3 for now. I'll have a look at it tomorrow.

svilupp commented 7 months ago

Thank you. Yeah, happy to join up forces!

My thoughts for llama.jl would be:

In terms of llama.cpp features that I’m keen on, it’s

I can see that the open-source models tend to struggle with Julia and every once in a while try to squeeze some python in, so it would be nice to add some finetuning specific for us. Naively hoping that within the next year we will manage to jointly build a dataset to do that (that’s why I started the LLM benchmark).

In general, I’d love to build some tooling in Julia to boost documentation search with RAG, have agents for writing your unit tests and docstrings, etc.

The smaller the community, the more we should look for ways to automate the non-core activities.

Any thoughts? (In particular on the immediate steps with llama.jl)

marcom commented 7 months ago

Thanks for all the PRs!

I think this sounds like a good plan. Here are some of my thoughts:

Server interface vs C interface, exports

As you said the server interface is probably the easiest way to interact with llama.cpp now, and hopefully won't undergo as many changes as the C interface has, i.e. currently the C low-level interface here in Llama.jl doesn't work anymore, as too many things have changed. I currently don't have the bandwidth to fix this, so this might stay this way for a while longer. Perhaps llama.cpp's C interface will eventually stabilize. So I agree that it's best to probably not export this low-level functionality for the time being.

Model downloads

Making the initial model download as easy as possible is a great idea. AFAIK at the moment it's unfortunately not possible to declare single-file artifacts in Julia. See: https://github.com/JuliaLang/Pkg.jl/issues/2764. I had a longish discussion on Slack with the Pkg.jl devs but there seemed to be little enthusiasm for single-file artifact support. The alternative of creating a new tarball for each model is unrealistic as you said. I would try to avoid this as much as possible and simply rely on HuggingFace. The models do tend to become outdated very fast at the moment.

Maybe these packages can be helpful, though I haven't tried them: https://github.com/FluxML/HuggingFaceApi.jl https://github.com/cjdoris/HuggingFaceHub.jl https://github.com/oxinabox/DataDeps.jl

My feeling is that I would avoid hardcoding any default models in run_server, as the models tend to change quite quickly and the default would then become part of the interface. Running a download_model once shouldn't be too bad, and the README or docs can give a helpful suggestion for some good models that the user can simply copy&paste.

Applications

Integration of LLMs into the REPL, vscode, grammar-guided sampling etc. sound like great ideas. A julia-specific LoRA or RLHF (or similar) model would be a great step forward for using these models effectively with julia. Maybe there could be a feedback mechanism to rate answers such as in https://chat.lmsys.org/ in your julia-LLM leaderboard, or the REPL/vscode could perhaps have a thumbs-up/thumbs-down kind of feedback mechanism that could gather data for fine-tuning.