feature: embedding support

mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference

https://localai.io

MIT License

23.73k stars 1.81k forks source link

feature: embedding support #70

Closed mudler closed 1 year ago

mudler commented 1 year ago

Add support to embeddings to the API and the llama backend: https://github.com/ggerganov/llama.cpp/blob/e4422e299c10c7e84c8e987770ef40d31905a76b/llama.cpp#L2160

[x] go-llama.cpp
[ ] go-gpt4all-j.cpp
[ ] go-gpt2.cpp

limcheekin commented 1 year ago

Just curious to find out what is the use/purpose of embeddings above.

For the following use case of Retrieval Augmented Data QA: https://blog.langchain.dev/tutorial-chatgpt-over-your-data/

Can't we use the following embedding models? I plan to use gpt4all-j with one of the following embeddings model.

Please advise. Thank you.

mudler commented 1 year ago

embeddings support has been merged to master. It is experimental and currently it's available only on llama.cpp based models, so any feedback is more than welcome!

To enable it you can set embeddings: true in the model's YAML config file

mudler commented 1 year ago

I've published a sample using embeddings over here: https://github.com/go-skynet/LocalAI/tree/master/examples/query_data

mudler commented 1 year ago

further optimizations in https://github.com/go-skynet/LocalAI/pull/222 - now embeddings can be used with bert on any model - and there is also a huge performance impact!

v4rm3t commented 1 year ago

Hello! I am trying to run a gpt4all-j model for building a local chatbot. How can I use an embedding using BERT and implement it for chat completions endpoint?

Currently, I am running it on Mac Mini i7, 32gb RAM. I am planning to upgrade it to a higher resource(vRAM) cloud server in future. Is it possible to make a fast chatbot API using own document embeddings?

michelec1000 commented 1 year ago

https://github.com/go-skynet/LocalAI/tree/master/examples/query_data

Thank you for the example! But it can't be included in the API? Currently I think you run those commands inside the container, right? Is there already the scenario that calling a certain path executes the query on the documents?