gpustack / gpustack

Manage GPU clusters for running LLMs
https://gpustack.ai
Apache License 2.0
279 stars 19 forks source link

Feature Request: Add Ollama-compatible APIs #299

Open arnesund opened 1 day ago

arnesund commented 1 day ago

I'm setting up Open WebUI as the web frontend for internal use of LLMs, complete with HA + SSO setup and all. GPUStack is used as an OpenAI-compatible backend and it works great for web-based access.

I'm also exploring ways to make it easy for our devs to get API access to the same wide variety of LLMs. Open WebUI exposes an /ollama API endpoint to passthrough requests to the Ollama-compatible backends, after authenticating requests. It doesn't support this for OpenAI-compatible backends. Any chance you'll consider to add Ollama-compatible APIs to GPUStack?

That would make this use-case easier to realize, by still having only one entrypoint (Open WebUI) where authentication is handled before passthrough of requests to GPUStack.

gitlawr commented 14 hours ago

OpenAI-compatible APIs are widely adopted in modern inference engines(llama.cpp, vllm, sglang, etc.) and local serving tools(lm studio, localai, etc.). I think it makes more sense if existing authentication in Open WebUI can work with any OpenAI-compatible backends.