langchain-ai / rag-from-scratch

2.73k stars 814 forks source link

retriever.get_relevant_documents is broken. tutorial (PART 1-4) #19

Open ca-mi-lo opened 6 months ago

ca-mi-lo commented 6 months ago

Following the tutorial (PART 1-4), I noticed that the model answered with "not enough information to answer". Looking at "docs" I get this output: [Document(page_content='Conversatin samples:\n[\n {\n "role": "system",', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})] Seems like docs = retriever.get_relevant_documents("What is Task Decomposition?") is not generating a correct page_content. If I invoke the chain for the whole context, aka splits, not docs I do get a meaningful answer. chain.invoke({"context":splits,"question":"What is Task Decomposition?"})

Note: I'm using using googles API, but I would expect that to be irrelevant.

rcorneanu commented 2 months ago

I've had the same issue with Ollama + llama3.1. After changing the embeddings model to "nomic-embed-text" instead of using the LLM, it worked great.

vectorstore = Chroma.from_documents(documents=splits, embedding=OllamaEmbeddings(model="nomic-embed-text"))