zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
47.91k stars 2.82k forks source link

GPT4All / 'raw' llama.cpp support #16408

Open KhazAkar opened 1 month ago

KhazAkar commented 1 month ago

Check for existing issues

Describe the feature

Currently, zed.dev supports ollama as provider, but it's not ideal for some configurations, because it does not support Vulkan yet (there's PR for it, but not yet merged) gpt4all.io supports running LLMs on GPU using Vulkan, which will speed things up. It also have local server endpoint available. If it's possible to configure it using existing configuration, would be great.

If applicable, add mockups / screenshots to help present your vision of the feature

Similar to ollama config:

  "assistant": {
    "version": "1",
    "provider": {
      "default_model": {
        "name": "name-of-model-file.gguf",
        "max_tokens": 2048,
        "keep_alive": -1
      },
      "name": "gpt4all" # or llama.cpp
    }
  },

This way, it would be possible to use 'raw' llama.cpp build as well, as gpt4all python bindings, which also have API endpoints, and you don't need to have UI around ;)

notpeter commented 1 month ago

Assuming the bindings support http endpoints with the appropriate semantics (e.g. OpenAI) we do expose custom endpoint setting that you could try.

If you get that working, I'd be happy to include some configuration notes in the docs.

KhazAkar commented 1 month ago

Assuming the bindings support http endpoints with the appropriate semantics (e.g. OpenAI) we do expose custom endpoint setting that could try.

If you get that working, I'd be happy to include some configuration notes in the docs.

Those bindings seem to be openAI compatible, they have example CLI server implementation in repo. I might try finding time to do so, but since I'm lacking it a little, I'm asking here 😁

notpeter commented 2 weeks ago

@rajivmehtaflex This is an unrelated enhancement request. Please do not hijack it for your your configuration issues. available_models is an array of objects, not an array of strings. Additionally I'm not sure whether OpenRouter supports the Ollama Rest API and I believe it only supports OpenAI API semantics. Please open a new issue.