phidatahq / phidata

Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.
https://docs.phidata.com
Mozilla Public License 2.0
15.53k stars 2.13k forks source link

lanceDB as vector DB for PDFs #1476

Open chiragvels opened 1 day ago

chiragvels commented 1 day ago

Hi,

I am trying to use vector DB for storing of my PDFs stored locally. So that I take refrence of this code and try to add PDFs in vector DB.

But when I run the script it was still processing PDF from example and not from PDF path I provided.

INFO Dropping collection
INFO Creating collection
INFO Loading knowledge base
INFO Reading: my_pdf_name
INFO Added 9 documents to knowledge base
INFO Starting playground on http://localhost:7777
┏━━━━━━━━━━━━━━━━━━━━━━━ Agent Playground ━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ ┃ URL: https://phidata.app/playground?endpoint=localhost%3A7777 ┃ ┃ ┃ ┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ INFO: Will watch for changes in these directories: ['/ai/local_rag_agent'] INFO: Uvicorn running on http://localhost:7777 (Press CTRL+C to quit) INFO: Started reloader process [1083436] using StatReload INFO Dropping collection
INFO Creating collection
INFO Loading knowledge base
INFO Reading: my_pdf_name
INFO Added 9 documents to knowledge base
INFO Dropping collection
INFO Creating collection
INFO Loading knowledge base
INFO Reading: https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf
INFO Added 14 documents to knowledge base
INFO: Started server process [1083641] INFO: Waiting for application startup. INFO: Application startup complete.

I am still not getting this behaviour.

My code is:


# Import necessary libraries
from phi.agent import Agent
from phi.model.ollama import Ollama
from phi.knowledge.pdf import PDFKnowledgeBase, PDFReader
from phi.vectordb.lancedb import LanceDb, SearchType
from phi.embedder.ollama import OllamaEmbedder
from phi.playground import Playground, serve_playground_app

# Define the collection name for the vector database
collection_name = "thai-recipe-index"

# Set up Qdrant as the vector database with the embedder
vector_db = LanceDb(
    table_name=collection_name,
    uri="tmp/lancedb",
    search_type=SearchType.vector,
    embedder=OllamaEmbedder()
)

# Define the knowledge base with the specified PDF URL
pdf_knowledge_base = PDFKnowledgeBase(
    path="/ai/local_rag_agent/pdf/my_pdf_name.pdf",
    # Table name: ai.pdf_documents
    vector_db=vector_db,
    reader=PDFReader(chunk=True),
)

# Load the knowledge base, comment out after the first run to avoid reloading
pdf_knowledge_base.load()

# Create the Agent using Ollama's llama3.2 model and the knowledge base
agent = Agent(
    name="Local RAG Agent",
    model=Ollama(id="llama3.2"),
    knowledge=pdf_knowledge_base,
)
ysolanky commented 1 day ago

Hello @chiragvels ! This is really odd, but I have not been able to replicate it. Is there also a PDFUrlKnowledgeBase already defined with the sample pdf?

chiragvels commented 1 day ago

Hi,

Thanks for your answer promptly.

I have tried PDFUrlKnowledgeBase with sample PDF before trying this PDFKnowledgeBase with my local PDF.

But not really think that I have already defined.

Also I can see it is Dropping collection and Creating collection on every request. I cannot see any documentation where I can continue with each PDF?

Is there any way I can use MongoDB as JSON input file to run this?

I have also this question of Usage.

Once I have knowledge base prepared how I can use same knowledge base to get diffrent answers based on knowledge already prepared? For example I have this knowledge_base ready but to get answer I need to prepare knowledge base everytime?

knowledge_base = JSONKnowledgeBase( path="/ai/local_rag_agent/jaon/input_data.json",

Table name: ai.json_documents

vector_db=vector_db

)

agent = Agent( knowledge_base=knowledge_base, add_references_to_prompt=True, model=Ollama(id="llama3.2"), )

agent.print_response("How has the ranking of 'Cheap Dumpster Rental Near Me' changed over time?", stream=True)

I want to use this agent every time to get the response.

Thanks,