Closed Vesyrak closed 9 months ago
Hi, thanks for checking it out! Did you skip the llama.cpp build step? The starter assumes a working llama.cpp ./main
, so you should already be able to run ./build/bin/Release/main [--opts]
(or wherever you've set the main_dir
to be) in the llama cpp directory. I can make this more explicit in the readme or starter comments if that's the missing step. That said, I'd recommend using a rest api server on top of llamacpp since the start-up time for each request can take a good while, unless you're specifically intending to experiment with different flags to ./main
via llm.nvim prompts.
Llama cpp has build in server support now (since like 2 months?)
./server -m ./models/codellama-13b-instruct.Q5_K_S.gguf -ngl 80
ngl - is for running model partially on gpu (its very fast - 10 lines of code in 1-2 secs! ) . And above model supposed to be gpt3.5 quality ;)
I tried to make llm.nvim work with:
{
"gsuuon/llm.nvim",
config = function()
local llm = require('llm')
local curl = require('llm.curl')
local util = require('llm.util')
local provider_util = require('llm.providers.util')
local M = {}
---@param handlers StreamHandlers
---@param params? any Additional params for request
---@param options? { model?: string }
function M.request_completion(handlers, params, options)
local model = (options or {}).model or 'bigscience/bloom'
-- TODO handle non-streaming calls
return curl.stream(
{
-- url = 'https://api-inference.huggingface.co/models/', --.. model,
url = 'http://127.0.0.1:8080/completion',
method = 'POST',
body = vim.tbl_extend('force', { stream = true }, params),
headers = {
-- Authorization = 'Bearer ' .. util.env_memo('HUGGINGFACE_API_KEY'),
['Content-Type'] = 'application/json',
-- ['data'] = '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}',
}
},
function(raw)
provider_util.iter_sse_items(raw, function(item)
local data = util.json.decode(item)
if data == nil then
handlers.on_error(item, 'json parse error')
return
end
if data.token == nil then
if data[1] ~= nil and data[1].generated_text ~= nil then
-- non-streaming
handlers.on_finish(data[1].generated_text, 'stop')
return
end
handlers.on_error(data, 'missing token')
return
end
local partial = data.token.text
handlers.on_partial(partial)
-- We get the completed text including input unless parameters.return_full_text is set to false
if data.generated_text ~= nil and #data.generated_text > 0 then
handlers.on_finish(data.generated_text, 'stop')
end
end)
end,
function(error)
handlers.on_error(error)
end
)
end
require('llm').setup({
hl_group = 'Substitute',
prompts = util.module.autoload('prompt_library'),
default_prompt = {
provider = M,
options = {
-- model = 'bigscience/bloom'
},
params = {
return_full_text = false
},
builder = function(input)
return { inputs = input }
end
},
})
end,
},
based on 'Adding your own prowider' - https://github.com/gsuuon/llm.nvim/blob/main/lua/llm/providers/huggingface.lua
But now when try to run: LLm - it will throw error:
Configuring llm.nvim is bit hard, not sure what I did wrong. Do I have to write my own prompts for it? I know there is OpenAI compability server script for llama (thus u have to run 2 servers: [llamacpp server] -> [OAI compatibility server] -> [nvim gpt plugin] , so that we can use openAI plugins, but with llama doing the work. But I would rather make it work directly with llama server.
Hi Jose! You know, I think it'd probably be better to simply remove the llamacpp cli provider and switch it to targeting the llamacpp server directly. Outside of playing around with llamacpp flags, the cli provider won't be very useful. I assumed most people would just use an openai compat server, but that does add another dependency and setup step.
I think I'm targeting llama server directly - with curl.stream () , is that what you meant?
In any case I manged to make some progress:
https://github.com/gsuuon/llm.nvim/assets/13521338/ef8abbee-bc54-4f51-b9c0-01d36dc4229e
As you can see code llama is very fast, (thx to compiling llampaccp, with cublast - cuda support)
EDIT:
Ok found out that indeed we can override on_finish() in setup section:
Still without formatting. It just outputs everything into buffer.
{
"gsuuon/llm.nvim",
config = function()
local llm = require "llm"
local curl = require "llm.curl"
local util = require "llm.util"
local provider_util = require "llm.providers.util"
local llamacpp = require "llm.providers.llamacpp"
local M = {}
---@param handlers StreamHandlers
---@param params? any Additional params for request
---@param options? { model?: string }
function M.request_completion(handlers, params, options)
local model = (options or {}).model or "bigscience/bloom"
-- vim.print(params)
-- TODO handle non-streaming calls
return curl.stream({
-- url = 'https://api-inference.huggingface.co/models/', --.. model,
url = "http://127.0.0.1:8080/completion",
method = "POST",
body = vim.tbl_extend("force", { stream = true }, params),
headers = {
-- Authorization = 'Bearer ' .. util.env_memo('HUGGINGFACE_API_KEY'),
["Content-Type"] = "application/json",
-- ['data'] = '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}',
},
}, function(raw)
provider_util.iter_sse_items(raw, function(item)
local data = util.json.decode(item)
if data == nil then
handlers.on_error(item, "json parse error")
return
end
if data.token == nil then
if data ~= nil and data.content ~= nil then
-- non-streaming
-- write data.content into active buffer
-- vim.api.nvim_put({ data.content }, "c", true, true) -- prints onout
handlers.on_finish(data.content, "stop")
return
end
handlers.on_error(data, "missing token")
return
end
local partial = data.token.text
handlers.on_partial(partial)
-- We get the completed text including input unless parameters.return_full_text is set to false
if data.generated_text ~= nil and #data.generated_text > 0 then
handlers.on_finish(data.generated_text, "stop")
end
end)
end, function(error)
handlers.on_error(error)
end)
end
local segment = require "llm.segment"
require("llm").setup {
hl_group = "Substitute",
-- prompts = util.module.autoload "prompt_library",
default_prompt = {
provider = M,
options = {
-- model = 'bigscience/bloom'
},
params = {
return_full_text = false,
},
builder = function(input)
return {
prompt = llamacpp.llama_2_format {
messages = {
input,
},
},
}
end,
mode = {
on_finish = function (final) -- somehow contains partial result... for llamacpp
-- vim.notify('final: ' .. final)
vim.api.nvim_put({final}, "c", true, true) -- prints onout
end,
on_partial = function (partial)
vim.notify(partial)
end,
on_error = function (msg)
vim.notify('error: ' .. msg)
end
}
},
prompts = {
-- ask = {
-- provider = M,
-- hl_group = "SpecialComment",
-- params = {
-- return_full_text = false,
-- },
-- builder = function(input)
-- return { inputs = input } -- will output gibberish
-- end
-- },
},
}
end,
},
I meant that I should remove the current llamacpp provider which uses the CLI and just have it talk to the server instead.
You're very close! Just call handlers.on_partial
with data.content
instead of on_finish
. Calling on_finish
is just necessary for post-completion transformers that are on the prompt (like extracting markdown). Do you want to open a PR to change the llamacpp provider to the server? Otherwise, I'll likely do that sometime today.
Re: your edit -- mode
is not necessary for implementing a new provider, it's just there if none of the default modes work for your use-case. You can hook into each part of the provider and do your own thing (e.g., I don't have chat in a sidebar implemented but you could add that as a new mode by overriding these). This is something you would use on the prompt side, not the provider side. This is helpful feedback, I'll clarify the design in the readme - it's not a great explainer at the moment.
@gsuuon I'm slowly getting there, but now with or without mode the response in automatically removed after last server response:
https://github.com/gsuuon/llm.nvim/assets/13521338/37e1ba26-7081-4fd3-85a7-49093eaf5cbc
I mean - I can just undo - to bring it back but is it not optimal. Also I noticed undo will remove one word by word, rather than whole, accumulated server response. I guess it would be cool if u could make it so that undo removes whole server reply, rather than by chunks. Edit: Yes I can make PR, when this works at least somehow. My target would be to make it so that lln.nvim takes:
That's the purpose of the :LlmDelete
command - I couldn't figure out a nice way to do that with the nvim api (wouldn't mind a PR here as well, since undo is more intuitive). :LlmDelete
will put back any text that was replaced.
The input
argument given to the prompt builder gets the entire buffer if you haven't selected anything, it'll only be the selected text if there's a selection. This part should be left to the prompt to handle. I'm not sure what you mean by input box -- do you mean the command arguments? Like :Llm myprompt additional instructions here
? That's exposed as context.args
.
Optional selected lines would be a nice feature but should be a separate PR from updating the llamacpp provider if you do tackle it.
@JoseConseco Oh btw, you can apply this patch to prevent the completion from disappearing - it's because on_finish
is being called with an empty string but we can just move that higher up and let the provider abstraction handle it with a sane default.
diff --git a/lua/llm/provider.lua b/lua/llm/provider.lua
index 8e5af0f..a0ebc10 100644
--- a/lua/llm/provider.lua
+++ b/lua/llm/provider.lua
@@ -30,7 +30,7 @@ M.mode = {
---@class StreamHandlers
---@field on_partial (fun(partial_text: string): nil) Partial response of just the diff
----@field on_finish (fun(complete_text: string, finish_reason?: string): nil) Complete response with finish reason
+---@field on_finish (fun(complete_text?: string, finish_reason?: string): nil) Complete response with finish reason. Leave complete_text nil to just use concatenated partials.
---@field on_error (fun(data: any, label?: string): nil) Error data and optional label
local function get_segment(input, segment_mode, hl_group)
@@ -232,12 +232,19 @@ end
local function request_completion_input_segment(handle_params, prompt)
local seg = handle_params.context.segment
+ local completion = ""
+
local cancel = start_prompt(handle_params.input, prompt, {
on_partial = function(partial)
+ completion = completion .. partial
seg.add(partial)
end,
on_finish = function(complete_text, reason)
+ if complete_text == nil or string.len(complete_text) == 0 then
+ complete_text = completion
+ end
+
if prompt.transform == nil then
seg.set_text(complete_text)
else
And then you would change the on_finish
call to just be on_finish()
in the provider
@gsuuon on_finish() - being called with empty string arg, that was my guess but I did not have time to make it work (by concatenating strings like u showed above, and feeding into on_finish) .
About input string, yes I saw you could add question as argument. But IMO it would be great, if you could create prompt :Llm "process_sel"
with would :
That can be added in the prompt, there's a starter called 'instruct' that shows how to use vim.ui.input to get some input.
EDIT: just noticed that example is out of date, fixed to the current api (you can return a function from builder
to do async things)
I think above 'instruct' would have to be remade into llama format:
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>
{{ user_message }} [/INST]
In any case I made PR. Tomorrow I can make some fixes if needed.
@gsuuon is there way to output AI message to popup window?
https://github.com/gsuuon/llm.nvim/assets/13521338/f923bb23-6374-4cfd-88e7-809613e9b440
The issue I have now - llama will output code with comments, and I cant seem to force it to output the pure code only. It would help, if output was written to new popup windwo, where user could copy the code, close popup, and paste into place... I was also wondering maybe: LlmDelete - shoudl be renamed LlmUndo - since this is what it is for
@JoseConseco moved to https://github.com/gsuuon/llm.nvim/discussions/15
Hey,
First of all, thanks for working on this! I was trying to get the local
llamacpp
provider working, without much success. I tried to debug what was going wrong, but my lack of lua knowledge makes this slow and difficult.What I did to try and get it working was the following:
LLAMACPP_DIR
to the rootllama.cpp
folderllm.nvim
Current environment is nvim v0.9.0, on a MacOS M1. Are there any steps I'm missing?