langchain-ai / opengpts

MIT License
6.5k stars 868 forks source link

How to convert from Redis to Pinecone? #116

Open fullstackdev610 opened 11 months ago

fullstackdev610 commented 11 months ago

I want to replace Redis database to another one, properly Pinecone. In Readme documentation, It mentioned that OpenGPT supports 50+ databases, but no ideas how to switch from Redis to another one. Could you please let me know? Thanks.

RyanTrojans commented 11 months ago

Me too thanks

CakeCrusher commented 11 months ago

@fullstackdev610 you will likely need to rework a lot of the backend

kevinNejad commented 11 months ago

here is a simple solution. keep in mind you still need to add filtering for each agent, depending on how you store vectors in pinecone index.

  1. make following changes in backend/packages/gizmo-agent/gizmo_agent/ingest.py
    
    from langchain.vectorstores.pinecone import Pinecone
    import pinecone

pinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENV"]) index = pinecone.Index("opengpt")

index_schema = { "tag": [{"name": "namespace"}], }

vstore = Redis(

redis_url=os.environ["REDIS_URL"],

index_name="opengpts",

embedding=OpenAIEmbeddings(),

index_schema=index_schema,

)

vstore = Pinecone(index, OpenAIEmbeddings().embed_query, "random_text")

2. make sure there is no None/NULL in the metadata. you can quickly fix this by making changes in `backend/packages/agent-executor/agent_executor/ingest.py`

def _update_document_metadata(document: Document, namespace: str) -> None: """Mutation in place that adds a namespace to the document metadata.""" document.metadata["namespace"] = namespace

for pincone

document.metadata["source"] = ""  # this is None by default, Pinecone doesn't like None/NULL, so set it to "" or remove it completely, or add the source.

   3. modify create_retriever_tool in `backend/packages/gizmo-agent/gizmo_agent/tools.py`

def get_retrieval_tool(assistant_id: str): return create_retriever_tool( vstore.as_retriever(),

vstore.as_retriever(

    #     search_kwargs={"filter": RedisFilter.tag("namespace") == assistant_id}
    # ),
    "Retriever",
    RETRIEVER_DESCRIPTION,
)
```
here is where you need to change to handle retrieving the docs for specific assistant.