vocodedev / vocode-core

🤖 Build voice-based LLM agents. Modular + open source.
https://vocode.dev
MIT License
2.58k stars 436 forks source link

Add New VectorDB Supabase (pgvector) Support for RAG in Vocode Open Source #465

Open arpagon opened 7 months ago

arpagon commented 7 months ago

Issue: Add New VectorDB Supabase (pgvector) Support for RAG in Vocode Open Source

Description

This issue proposes the integration of Supabase (pgvector) as a new Vector Database (VectorDB) option for the Retrieval-Augmented Generation (RAG) feature in the Vocode Open Source project. The addition of Supabase (pgvector) aims to enhance the capabilities of RAG by leveraging its efficient vector search functionality, especially beneficial for handling large datasets and complex queries in conversational AI applications.

Objectives

  1. Integration of Supabase (pgvector) with Vocode RAG: Establish a seamless connection between Vocode's RAG feature and Supabase (pgvector) to enable efficient vector storage and retrieval.
  2. Optimization for Conversational AI: Ensure that the integration is optimized for conversational AI use cases, focusing on query performance, scalability, and accuracy.
  3. Documentation and Examples: Provide comprehensive documentation and practical examples to guide users in utilizing Supabase (pgvector) with Vocode RAG.

Motivation

https://supabase.com/blog/pgvector-vs-pinecone

Implementation Considerations

Call for Contributions

We encourage contributions from the community to help with the implementation, testing, and documentation of this feature. Whether you're an expert in databases, conversational AI, or a keen open-source contributor, your input is highly valued.

Conclusion

The addition of Supabase (pgvector) as a new VectorDB option in Vocode Open Source is expected to significantly enhance the RAG feature, providing users with more flexibility and performance benefits. We look forward to collaborating with the community on this exciting development.

rahulbansal16 commented 7 months ago

I am working on this issue and according to my research. There are two ways for building this.

  1. Using the https://github.com/supabase/vecs client for making supabase connection.
  2. or Using sqlalchemy the way langchain has done langchain-ai/langchain/libs/community/langchain_community/vectorstores/pgvector.py

I am planning to go ahead with approach number 1 as it will be easier and faster to implement. What do you think?

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rjheeta commented 2 months ago

Hi @arpagon I have some questions on the Vector DB implementation and maybe it's relevant for this. Note that these questions are not related to PGVector specifically, but the Vocode's vector db implementation as a whole

  1. What is the purpose of including an add_text method? My understanding is the vector DB should already have been built when it's connected to Vocode. That is, Vocode responsibility is simply the retrieval -- adding text should, therefore, not be part of the interface? https://github.com/vocodedev/vocode-python/blob/cfd2eb44308cfe28136b409e22706bd5465b6c46/vocode/streaming/vector_db/pinecone.py#L27

  2. Moreover, Vocode's current implementation of similarity_search_with_score is essentially what langchain already has implemented. So my question is, should Vocode just use langchain directly rather than making it's own implementation? https://github.com/vocodedev/vocode-python/blob/cfd2eb44308cfe28136b409e22706bd5465b6c46/vocode/streaming/vector_db/pinecone.py#L71