switch openAI API to other open source LLMs

paulpierre / RasaGPT

💬 RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram

https://rasagpt.dev

MIT License

2.36k stars 227 forks source link

switch openAI API to other open source LLMs #5

Open spinning27 opened 1 year ago

spinning27 commented 1 year ago

I wonder if it is possible to switch openAI API to other open source LLMs.

Thanks

paulpierre commented 1 year ago

yea, that's totally possible.

specifically which ones did you have in mind? there's a new one every 3 days ;)

spinning27 commented 1 year ago

@paulpierre , thanks for your prompt reply.

How about quantized llama 7b that has been in town for a bit time. :)

paulpierre commented 1 year ago

of course and thanks @spinning27

what is your host operating system? i can look into creating a LLaMa branch over the weekend because I'm actually curious on the implementation.

some questions before exploring:

base LLaMa isn't fine-tuned for QA/chat AFAIR. would you be open to other optimal options:
would it make more sense to run remote inference on HF vs local. i think most devs stand to benefit from this convenience

let me know your thoughts 👍

spinning27 commented 1 year ago

Actually, anyone @paulpierre mentioned would do as long as it is free (the cost of openAI API access could go up quite quickly with lots of query).

ATM, I mainly play it for out of curiosity. I have access to a GPU server machine or run locally on my M1 machine.

StableVicana-13B seems memory hungry. Not managed to run it on the machine yet, except quantized (compressed) version via llama.cpp.

isu-shrestha commented 1 year ago

@paulpierre First of all, great concept! This is going to get a lot of traction for sure.

About an open-source LLM, wouldn't it be best to use GPT4All? It can be used commercially as well since there is a variant that's based on GPT-J. Also, it is supported by Langchain which means that retrieval augmented QA will also be relatively easy.

spinning27 commented 1 year ago

Langchain currently develops at break-neck speed. Not sure sometimes it does not work like this one.

isu-shrestha commented 1 year ago

@spinning27 Fair. However, an argument can be made to pick a version that works, and stick to it? In any case, I think having the ability to use open-source LLMs would definitely be interesting to people and organizations that want to decentralize, and protect their data from third party APIs.

vchauhan1 commented 1 year ago

The best solution to beat the pricing of OPENAI is use your own deployed llms using fastChat or textgen-ui, they have nice openai api

However, A M1 laptop can easily run llm locally and this can be used for any prototype thing.

Sample code.

llm = LlamaCpp(model_path="models/llama-7b.ggmlv3.q4_0.bin", n_ctx=2048, verbose=True) embeddings_model = LlamaCppEmbeddings(model_path="models/llama-7b.ggmlv3.q4_0.bin")

proitservices commented 1 year ago

Been looking to build this myself and there we go - a fine-crafted project already there. I've made a replacement for ada embeddings @paulpierre - perhaps hosting it locally would cut on running costs = https://github.com/proitservices/elmo_embedding_api also, GPT4ALL is a great choice and with DB + Langchain the vicuna13b model would do just amazing (mistral 7b is also a good choice). Would love to see this project grow into a fully featured and capable 'Jarvis' with memory and math capabilities + a rain of APIs extensions oob. Happy to help with those, Peter