unclear docs for using existing Weaviate database

nick-youngblut commented 6 months ago

Issue with current documentation:

The Weaviate docs just show examples of using Weaviate.from_documents() or Weaviate.from_texts(), but do not describe how to use an existing database with embedded documents already stored in the database.

Moreover, the descriptions of the index_name or text_key that are required when calling Weaviate() are not described anywhere in the Weaviate docs, and index_name seems to be very misleading, given that it should be the name of the Weaviate class.

A naive attempt of:

import os
import dotenv
import weaviate
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import Weaviate
from langchain_core.runnables import RunnablePassthrough

dotenv.load_dotenv()

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

client = weaviate.Client(
    url="http://34.170.94.90",
    auth_client_secret=weaviate.AuthApiKey(os.environ["WEAVIATE_API_KEY"])
)

db = Weaviate(client, 'my_class_name', 'text')

retriever = db.as_retriever()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt 
    | llm 
    | output_parser
)

chain.invoke({"input" : "My particular query"})

Results in the error:

ValueError: Error during query: [{'locations': [{'column': 46, 'line': 1}], 'message': 'Syntax Error GraphQL request (1:46) Expected Name, found String "input"\n\n1: {Get{my_class_name(limit: 4 nearText: {concepts: [{"input": "My particular query"}]} ){text}}}\n                                                ^\n', 'path': None}]

...which isn't too helpful on understanding the issue.

Note: I did actually use the correct class name instead of my_class_name.

Idea or request for content:

It would be very helpful to include an example of using an existing Weaviate database, which included inspecting the DB schema to determine the values to use for index_name and text_key.

dosubot[bot] commented 6 months ago

🤖

Hi @nick-youngblut,

Nice to meet you! I'm Dosu, your friendly bot assistant. I'm here to help you solve those pesky bugs, answer any questions you might have about LangChain, and guide you on your journey to becoming a contributor. While we wait for a human maintainer, feel free to ask me anything you'd like. I'm here to help!

Let's get started, shall we?

I understand your confusion and I agree that the documentation could be clearer on how to use an existing Weaviate database with LangChain. Let's clarify some points:

The index_name parameter in the Weaviate class in LangChain is used as the name of the index in the Weaviate database where the vectors are stored. It is equivalent to the "class" in Weaviate's terminology. When you create a new instance of the Weaviate class, LangChain will check if an index with this name already exists in the Weaviate database. If it doesn't, a new index will be created with this name.

The text_key parameter is the key used for uploading and retrieving text to and from the Weaviate database. This is the name of the property in the Weaviate database that will store the text of the documents.

Here is an example of how to use these parameters:

import weaviate
from langchain.vectorstores import Weaviate

# Create a Weaviate client
client = weaviate.Client(url="http://localhost:8080")

# Initialize a Weaviate instance with LangChain
weaviate_instance = Weaviate(client, index_name="my_index", text_key="my_text")

In this example, my_index is the name of the index in the Weaviate database that you want to interact with, and my_text is the key that will be used for uploading and retrieving text to and from the Weaviate database.

Regarding the error you're encountering, it seems like there might be an issue with the GraphQL query that's being generated. The error message suggests that there's a syntax error in the query, specifically at the point where it's trying to use the string "input". It's possible that this is a bug in the LangChain code that generates the query. I would recommend checking the format of your input data and the way you're calling the invoke method on the chain.

I hope this helps clarify things. If you have any further questions, please don't hesitate to ask.

Sources

libs/langchain/langchain/vectorstores/weaviate.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

MXall0c commented 4 months ago

@dosu-bot this helps, but does not illustrate how to incorporate embeddings into the query.

pi-crust commented 4 months ago

@nick-youngblut were you able to figure out the solution for this issue ?

nick-youngblut commented 4 months ago

@pi-crust no, I was never able to use db.as_retriever() without an error.

MkButterfingers commented 2 months ago

@nick-youngblut I was hoping i would find the answer here. Apparently llamaindex allows you to use an existing weaviate vector store though.

nick-youngblut commented 2 months ago

Apparently llamaindex allows you to use an existing weaviate vector store though

Thanks @MkButterfingers for the advice. I am currently using llama-index with Weaviate.

langchain-ai / langchain