using python 3.11

using python 3.11

from dotenv import load_dotenv import os from langchain.chat_models import ChatOpenAI from qdrant_client import QdrantClient as qcqc from langchain.embeddings import HuggingFaceInstructEmbeddings from langchain.vectorstores import Qdrant from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo

load_dotenv() openai_key = os.getenv('OPENAI_API_KEY') db_path = os.getenv('vectordb_local_path') key = openai_key llm = ChatOpenAI( temperature = 0, model = 'gpt-3.5-turbo', streaming = True)

text_metadata = [AttributeInfo(name = 'book name', description = "name of the book.", type = "string"), AttributeInfo(name = 'author', description = 'Author of the book', type = 'string'), AttributeInfo(name = 'creation data', description = 'the date the book was written', type = 'list[int]'), AttributeInfo(name = 'page', description = "page number.", type = "int"), AttributeInfo(name = 'images', description = "dictionary whoes keys are name and description of images on the page,\ and whoes contents are image references on pdfs", type = "dict{string:string}"), AttributeInfo(name = 'tables', description = 'list of tables from the page', type = 'list[dataframe]') ]

def retreive_conversation_construct(store,store_content_description, metadata_format=text_metadata,verbose=False): ''' this is the first part of this function, and is the first problem i ran into ''' retriever = SelfQueryRetriever.from_llm(llm = llm, vectorstore=store, document_contents = store_content_description, metadata_field_info = metadata_format, enable_limit=True, fix_invalid = True, verbose=verbose) return retriever client = qcqc(path= db_path) model_name = "hkunlp/instructor-xl" model_kwargs = {'device': 'cuda'} encode_kwargs = {'normalize_embeddings': True} load_dotenv() path = os.getenv('instructor_local_dir') os.environ['CURL_CA_BUNDLE'] = '' embed_instruction ='Represent the document for retrieval: ' embeddings = HuggingFaceInstructEmbeddings( model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, cache_folder = path, embed_instruction = embed_instruction) vector_store = Qdrant(client= client, collection_name= 'my cluster', embeddings= embeddings) store_content_description = 'this is a paper about generating training data for large language models.' retreive_conversation_construct(vector_store,store_content_description)

Expected behavior

retriever should get generated.

I found in, the .from_llm() method eventually leads to _get_builtin_translator getting called, which returns QdrantTranslator(metadata_key=vectorstore.metadata_payload_key) as structured_query_translator.

but later when calling structured_query_translator.allowed_operators from, the QdrantTranslator doesn't have allowed_operators, thus returns a None object.

this results in the following error:

File d:\ai_dev\research_assistant\ retreive_conversation_construct(vector_store,store_content_description)

File d:\ai_dev\research_assistant\ in retreive_conversation_construct retriever = SelfQueryRetriever.from_llm(llm = llm,

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\retrievers\self_query\ in from_llm query_constructor = load_query_constructor_runnable(

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\ in load_query_constructor_runnable prompt = get_query_constructor_prompt(

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\ in get_query_constructor_prompt allowed_operators=" | ".join(allowed_operators),

TypeError: can only join an iterable

dosubot[bot] commented 11 months ago



The error you're encountering is due to the allowed_operators attribute of the QdrantTranslator object being None. This is causing an issue when the from_llm method of the SelfQueryRetriever class tries to join this None value, leading to a TypeError: can only join an iterable.

In the context shared, the QdrantTranslator class does not have an allowed_operators attribute. Instead, it has an allowed_comparators attribute. This could be the source of the confusion and the error.

If allowed_operators is a required attribute for the QdrantTranslator object, you should add it to the class definition and initialize it as an iterable. Here's a possible modification:

class QdrantTranslator(Visitor):
    """Translate `Qdrant` internal query language elements to valid filters."""

    allowed_comparators = (
    """Subset of allowed logical comparators."""

    allowed_operators = []  # Add this line

    def __init__(self, metadata_key: str):
        self.metadata_key = metadata_key
        # ...

If allowed_operators is supposed to be None in some cases, you should modify the from_llm method of the SelfQueryRetriever class to handle this case. Here's a possible modification:

if "allowed_operators" not in chain_kwargs:
    allowed_operators = structured_query_translator.allowed_operators
    if allowed_operators is None:
        allowed_operators = []  # or some other default value
    chain_kwargs["allowed_operators"] = allowed_operators

This code checks if allowed_operators is None and, if it is, sets it to an empty list (or some other default value) before assigning it to chain_kwargs["allowed_operators"]. This ensures that chain_kwargs["allowed_operators"] is always an iterable, preventing the TypeError.

Please try these modifications and let me know if the issue persists.

Regarding the purpose of the allowed_operators attribute in the QdrantTranslator object within the LangChain codebase, I wasn't able to find an answer within the repository. It's possible that this attribute is used in some other part of the codebase that wasn't included in the context provided.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

xieqihui commented 10 months ago

I have submitted a PR #12328 to fix this bug.

