phidatahq / phidata

Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.
https://docs.phidata.com
Mozilla Public License 2.0
15.45k stars 2.12k forks source link

Issue/question with knowledge base loading #1435

Open Pablo-Merino opened 6 days ago

Pablo-Merino commented 6 days ago

Hello! I'm trying to use the TextKnowledgeBase with LanceDB, to load the FAQ content I have saved as TXT files. This is the code I'm using

from dotenv import load_dotenv
load_dotenv(override=True)
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.embedder.openai import OpenAIEmbedder
from phi.knowledge.text import TextKnowledgeBase
from phi.vectordb.lancedb import LanceDb, SearchType
from pathlib import Path

vector_db = LanceDb(
    table_name="wak_faq",
    uri="./lancedb",
    search_type=SearchType.hybrid,
    embedder=OpenAIEmbedder(model="text-embedding-3-large"),
)

knowledge_base = TextKnowledgeBase(
    path=Path("./faq_text"),
    vector_db=vector_db,
)

# Comment out after first run as the knowledge base is loaded
knowledge_base.load(
    recreate=False,
    skip_existing=True
)

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    # Add the knowledge base to the agent
    knowledge=knowledge_base,
    show_tool_calls=True,
    markdown=True,
    add_context=True

)
agent.print_response("what payment methods do you accept?", stream=True)

This code is supposed to run on a Lambda function, btw.

The issue I have is that I have uncommented the knowledge_base.load() line on the first run. This creates all the LanceDB files/indexes and the question to the agent works fine.

However, as stated in the docs, I comment the load() line after that first run, and then the agent no longer works.

I figured that the knowledge base needs to be loaded at least once every execution (which I don't think it's a good pattern for a Lambda function).

My issue/question is:

Thanks a lot! Have a great day!

ysolanky commented 3 days ago

Hello @Pablo-Merino !

Thanks for sharing your code! Your Agent config looks great. Ideally the knowledge.load() should be commented out after the vector db has been populated. But since you have recreate=False and skip_existing=True params set, the database should not be loaded again. But I just tested LanceDb and there seems to be a bug that is causing it to load the documents every time. Sorry about that. I am going to push out a fix