Create an inference server

sabaimran commented 1 year ago

Setup an inference server which gives access to several different models. Include access to:

Llama V2 70B
Anthropic Claude 2
OpenAI GPT4

Specification:

text prompt should come in raw with relevant compiled references, conversation history. Message can be formatted on the server.

ishaan-jaff commented 1 year ago

Hi @sabaimran I believe I can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm - we allow you to use any LLM as a drop in replacement for gpt-3.5-turbo.

You can use LiteLLM in the following ways:

With your own API KEY:

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
os.environ["COHERE_API_KEY"] = "your-key" # 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)

Using the LiteLLM Proxy with a LiteLLM Key

this is great if you don’t have access to claude but want to use the open source LiteLLM proxy to access claude

from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your openai key
os.environ["COHERE_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your cohere key

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)

krrishdholakia commented 1 year ago

Hey @sabaimran @ishaan-jaff any updates on this?

sabaimran commented 1 year ago

Hey @ishaan-jaff , @krrishdholakia , what does this offer over our current setup? In the example where we'd use our own API key, what would be the benefit of that for using LiteLLM vs. just calling OpenAI directly?

sabaimran commented 1 year ago

We're planning to host the proxy server on our own infrastructure. Some dev tools for hosting options:

We'll prioritize support for the best available open source models.

krrishdholakia commented 1 year ago

We have an open-source proxy code that might help - https://github.com/BerriAI/liteLLM-proxy/blob/main/main.py

It seems like that's what you're trying to build - i.e. a server that sits in front of your LLMs (self-deployed + openai/anthropic/et.) and makes the API calls for you.

Huggingface TGI, Anthropic, OpenAI have different input params, and output formats.

LiteLLM simplifies that by keeping them all consistent with the OpenAI format.

@sabaimran What is litellm missing to be useful to you? Any feedback here would be helpful.

sabaimran commented 5 months ago

Hey guys! Closing the loop here. We're not going to setup our own inference server, but litellm would be the proxy server of choice for when we are using additional models that can leverage an OpenAI API-compatible interface. We would most likely set this up in a private codebase for ease of use. Have had a good dev experience with the litellm proxy server via docker. Thanks, @krrishdholakia & @ishaan-jaff !

krrishdholakia commented 5 months ago

@sabaimran how could the docker spin up process have been easier? working on improving the quick start flow this week

khoj-ai / khoj

Create an inference server #437

With your own API KEY:

Using the LiteLLM Proxy with a LiteLLM Key