khoj-ai / khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (e.g gpt, claude, gemini, llama, qwen, mistral).
https://khoj.dev
GNU Affero General Public License v3.0
14.27k stars 707 forks source link

Create an inference server #437

Closed sabaimran closed 5 months ago

sabaimran commented 1 year ago

Setup an inference server which gives access to several different models. Include access to:

Specification:

ishaan-jaff commented 1 year ago

Hi @sabaimran I believe I can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm - we allow you to use any LLM as a drop in replacement for gpt-3.5-turbo.

You can use LiteLLM in the following ways:

With your own API KEY:

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
os.environ["COHERE_API_KEY"] = "your-key" # 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)

Using the LiteLLM Proxy with a LiteLLM Key

this is great if you don’t have access to claude but want to use the open source LiteLLM proxy to access claude

from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your openai key
os.environ["COHERE_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your cohere key

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
krrishdholakia commented 1 year ago

Hey @sabaimran @ishaan-jaff any updates on this?

sabaimran commented 1 year ago

Hey @ishaan-jaff , @krrishdholakia , what does this offer over our current setup? In the example where we'd use our own API key, what would be the benefit of that for using LiteLLM vs. just calling OpenAI directly?

sabaimran commented 1 year ago

We're planning to host the proxy server on our own infrastructure. Some dev tools for hosting options:

We'll prioritize support for the best available open source models.

krrishdholakia commented 1 year ago

We have an open-source proxy code that might help - https://github.com/BerriAI/liteLLM-proxy/blob/main/main.py

It seems like that's what you're trying to build - i.e. a server that sits in front of your LLMs (self-deployed + openai/anthropic/et.) and makes the API calls for you.

Huggingface TGI, Anthropic, OpenAI have different input params, and output formats.

LiteLLM simplifies that by keeping them all consistent with the OpenAI format.

@sabaimran What is litellm missing to be useful to you? Any feedback here would be helpful.

sabaimran commented 5 months ago

Hey guys! Closing the loop here. We're not going to setup our own inference server, but litellm would be the proxy server of choice for when we are using additional models that can leverage an OpenAI API-compatible interface. We would most likely set this up in a private codebase for ease of use. Have had a good dev experience with the litellm proxy server via docker. Thanks, @krrishdholakia & @ishaan-jaff !

krrishdholakia commented 5 months ago

@sabaimran how could the docker spin up process have been easier? working on improving the quick start flow this week