Merge of user config with default config causes issues

Robzz commented 6 months ago

Hi there, thanks for the cool plugin.

I noticed an issue caused by how you merge the default config with the user provided config in config.lua. I usually use ollama and decided to try out vllm, but had a a bit of trouble getting it to work because of this.

vllm exposes an openAI compatible API, so my request_body looked something like the following:

request_body = {
    temperature = 0,
    top_p = 0.9,
    presence_penalty = 1,
    stop = { '<|file_separator|>', '<|fim_prefix|>', '<|fim_suffix|>', '<|fim_middle|>' }
}

vllm kept throwing me HTTP bad request errors, and sniffing the requests sent to vllm showed me that the default values for the request body (the parameters object) were being sent along with my own request body, which was unexpected. vllm visibly does not like unexpected arguments. The problem comes from the use of vim.tbl_deep_extend(), which recursively merges the default config map with the user provided one. As a consequence, the user provided request body gets merged with the default one, which for the request body in particular is surprising behavior.

You can observe the same behavior by using the configuration provided in the README for say, ollama and looking at the transmitted request body: the parameters object is present alongside the options object although it's not present in the user config, ollama is just more graceful than vllm about it.

I worked around it by explicitly deleting the parameters object from the llm.nvim config table just after the setup() call with this: require('llm.config').config.request_body.parameters = nil. Not a complex workaround, but I did need to go and check the plugin source to figure it out.

The fix that I would suggest is that, since the default config is provided for the huggingface backend, you should either not merge the default request_body with the user config if the user is not using the huggingface backend, or include a matching default request body for other backends (for instance what's in the README) and merge with that. Let me know if you'd like a PR for it.

McPatate commented 5 months ago

Let me know if you'd like a PR for it.

Hi yes would be interested in a PR! Also note that depending on the backend you choose, you could see extra params being added, cf https://github.com/huggingface/llm-ls/blob/main/crates/llm-ls/src/backend.rs#L210.

Perhaps we could also add vllm as a backend in llm-ls?

Robzz commented 5 months ago

Thanks for the llm-ls link, that's informative.

Perhaps we could also add vllm as a backend in llm-ls?

That depends what you mean I suppose. From my (little) experience with vllm, it seems there are mainly 2 ways to consume it:

the OpenAI-compatible API, which should work without issue as is without particular support from llm-ls. It does support some additional parameters compared to the OpenAI API, but that shouldn't really be a problem from what I'm seeing.
Then, there is the in-process inference engine and its Python API, which I suppose you could indeed support in llm-ls. I can see this feature being useful to some, not having to maintain a separate inference server and having it automagically start when firing up llm-ls could be nice. Maybe you can get some additional benefits out of llm-ls owning the inference server, but I suspect you'll know more than me about that.

I guess it boils down to whether you would consider llm-ls managing an inference server to be in llm-ls's scope. llm-ls#89 is related.

As for the PR, I'll try to get that on the way this week-end!

McPatate commented 5 months ago

the OpenAI-compatible API, which should work without issue as is without particular support from llm-ls. It does support some additional parameters compared to the OpenAI API, but that shouldn't really be a problem from what I'm seeing.

Ok perfect, so nothing to do here!

Then, there is the in-process inference engine and its Python API, which I suppose you could indeed support in llm-ls. I can see this feature being useful to some, not having to maintain a separate inference server and having it automagically start when firing up llm-ls could be nice. Maybe you can get some additional benefits out of llm-ls owning the inference server, but I suspect you'll know more than me about that.

This is a good question. On the one hand it'd make it easier for people not to have to handle the inference server, and on the other I wonder if it is not redundant with all the already existing tooling. That being said, given llm-ls is compatible with llama.cpp & ollama backends, one could imagine that llm-ls would auto-start a llama.cpp sub-process if it is selected as a backend. I have to think about that, but it could be a really cool feature indeed!

huggingface / llm.nvim

Merge of user config with default config causes issues #87