justinthelaw / opera

Opera: Optimized Performance and Evaluation Rhetoric AI, for USAF and USSF performance statements, in TypeScript and Python
MIT License
6 stars 1 forks source link

Feature(Client): Client-side LLM generation #154

Open justinthelaw opened 1 year ago

justinthelaw commented 1 year ago

Is your feature request related to a problem? Please describe. Hosting an API via a separate server requires extra resources and configuration.

Describe the solution you'd like

Describe alternatives you've considered Free hosting via different services has significant restrictions and adds some inflexible configuration.

Additional context Hosting a purely frontend application allows us to build it and serve it on GitHub Pages. The GitHub Pages site can be used as a free-trial to users who want to use it without contributing to the project.

A new scheme of allowing users to access a dedicated API for bullet generation can be provided in a future release of the application. The scheme may include the fulfillment of some conditions like: 1) create an account linked to your GitHub or Gmail, 2) contribute code to the open source repo under your account, 3) contribute clean data to the open source repo under your account, 4) Buy us coffee under your account, etc.

ishaan-jaff commented 11 months ago

Hi @justinthelaw I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm

TLDR: We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo. You can use our proxy server for making your LLM calls if you don't want to spin up additional resources

Usage

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# falcon call
response = completion(model="falcon-40b", messages=messages)
justinthelaw commented 11 months ago

Hi @justinthelaw I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm

TLDR: We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo. You can use our proxy server for making your LLM calls if you don't want to spin up additional resources

Usage

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# falcon call
response = completion(model="falcon-40b", messages=messages)

Hi @ishaan-jaff ! Thanks for the suggestion; however this particular issue is related to experimenting with simple, light-weight model hosting via the front end.

When it comes to hosted or cloud-based inferencing, we've created a simple FastAPI server for serving the eventual set of fine-tuned Opera LLM models.

For more context, we use custom model checkpoints (like the ones seen in our HuggingFace repo) that aren't super standard, and then we fine tune those further into custom models for specific tasks. We store the resulting weights and configs locally, as the file size and inferencing speed is low enough for now.