simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.01k stars 220 forks source link

`brew install llm` + `llm install llm-llama-cpp` + `llm install llama-cpp-python` Issues on M2 MBP #250

Open pypeaday opened 1 year ago

pypeaday commented 1 year ago

I wanted to try a self-hosted offline LLM to summarize some notes and this looked like the most straightforward thing to try! I've attached a screenshot with hopefully relevant information

Tutorial/Blog: https://simonwillison.net/2023/Aug/1/llama-2-mac/

image

It's worth noting that llm isn't broken

llm -m llama-2-7b-chat 'Five outrageous names for a pet pelican' --system "you are funny" works just fine so my first though was the --alias wasn't working even though it shows up in llm models output image

I also went the route of setting up a venv with python 3.11.4, and installing llm with pip. When I did that I could not use llm llama-cpp download-model ... with the --alias flag: Error: No such option: --alias

pypeaday commented 1 year ago

Ah looks like the gpt4all models actuall work, and in my example I didn't notice this but it's using the gpt4all: llama-2-7b-chat model... so looks like my issues are specifically with the models prefixed with LlamaModel:

vividfog commented 1 year ago

The underlying llama-cpp-python library and llama.cpp underneath it are a constantly moving target. Both worked for me a month ago, but right now the latest llama-cpp-python build from Github (if asked with Metal) gives a cmake build error, and using a PyPI version gives truncated responses. Even after manually tweaking the n_ctx (= max_tokens) in the source code.

I don’t think llm can provide a productive Llama-cpp module before the underlying library becomes such that it’s possible to reliably build it. Or use a wheel that works with both Python 3.10 and 11, both arm and x86, and the output is not truncated for all or many models.

What llm could do, is to expose the errors, if any. The thing is, llama.cpp produces a ton of output by default, so the wrapper code would then need to use rules or options on whether it’s a “real” error or just debug output.

A project called open-interpreter includes a detailed and automatic llama-cpp-python build process. It too truncates answers with Code-Llamas of all sizes. But once that project runs well with Code-Llama, then llm is also in a good position to run with the same options. I’m following both projects. https://github.com/KillianLucas/open-interpreter

sfc-gh-jroes commented 1 year ago

I believe this issue is related: https://github.com/simonw/llm/issues/189