Closed Gilnore closed 7 months ago
🤖
Hello,
The error you're encountering is due to the allowed_operators
attribute of the QdrantTranslator
object being None
. This is causing an issue when the from_llm
method of the SelfQueryRetriever
class tries to join this None
value, leading to a TypeError: can only join an iterable
.
In the context shared, the QdrantTranslator
class does not have an allowed_operators
attribute. Instead, it has an allowed_comparators
attribute. This could be the source of the confusion and the error.
If allowed_operators
is a required attribute for the QdrantTranslator
object, you should add it to the class definition and initialize it as an iterable. Here's a possible modification:
class QdrantTranslator(Visitor):
"""Translate `Qdrant` internal query language elements to valid filters."""
allowed_comparators = (
Comparator.EQ,
Comparator.LT,
Comparator.LTE,
Comparator.GT,
Comparator.GTE,
)
"""Subset of allowed logical comparators."""
allowed_operators = [] # Add this line
def __init__(self, metadata_key: str):
self.metadata_key = metadata_key
# ...
If allowed_operators
is supposed to be None
in some cases, you should modify the from_llm
method of the SelfQueryRetriever
class to handle this case. Here's a possible modification:
if "allowed_operators" not in chain_kwargs:
allowed_operators = structured_query_translator.allowed_operators
if allowed_operators is None:
allowed_operators = [] # or some other default value
chain_kwargs["allowed_operators"] = allowed_operators
This code checks if allowed_operators
is None
and, if it is, sets it to an empty list (or some other default value) before assigning it to chain_kwargs["allowed_operators"]
. This ensures that chain_kwargs["allowed_operators"]
is always an iterable, preventing the TypeError
.
Please try these modifications and let me know if the issue persists.
Regarding the purpose of the allowed_operators
attribute in the QdrantTranslator
object within the LangChain codebase, I wasn't able to find an answer within the repository. It's possible that this attribute is used in some other part of the codebase that wasn't included in the context provided.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I have submitted a PR #12328 to fix this bug.
Hi, @Gilnore,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported a TypeError related to the allowed_operators
attribute of the QdrantTranslator
object being None
. Dosubot provided a detailed explanation of the error and suggested modifications to the code. Subsequently, Xieqihui submitted a pull request (#12328) to fix the bug.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you!
System Info
accelerate==0.23.0 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.1.2 annotated-types==0.6.0 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 asttokens==2.4.0 async-timeout==4.0.3 attrs==23.1.0 auto-gptq==0.4.2 backcall==0.2.0 bentoml==1.1.7 bitsandbytes==0.41.1 blinker==1.6.3 build==1.0.3 cachetools==5.3.1 cattrs==23.1.2 certifi==2023.7.22 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.0 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 cmake==3.27.7 colorama==0.4.6 coloredlogs==15.0.1 comm==0.1.4 contextlib2==21.6.0 contourpy==1.1.1 cryptography==41.0.4 cssselect==1.2.0 cuda-python==12.2.0 cycler==0.12.1 Cython==3.0.4 dataclasses-json==0.6.1 datasets==2.14.5 debugpy==1.8.0 decorator==5.1.1 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distro==1.8.0 et-xmlfile==1.1.0 executing==2.0.0 fastcore==1.5.29 filelock==3.12.4 filetype==1.2.0 fonttools==4.43.1 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.6.0 ghapi==1.0.4 gitdb==4.0.10 GitPython==3.1.40 greenlet==3.0.0 grpcio==1.59.0 grpcio-health-checking==1.59.0 grpcio-tools==1.59.0 h11==0.14.0 h2==4.1.0 hpack==4.0.0 httpcore==0.18.0 httpx==0.25.0 huggingface-hub==0.17.3 humanfriendly==10.0 hyperframe==6.0.1 idna==3.4 importlib-metadata==6.8.0 inflection==0.5.1 InstructorEmbedding==1.0.1 ipykernel==6.25.2 ipython==8.16.1 ipywidgets==8.1.1 jedi==0.19.1 Jinja2==3.1.2 joblib==1.3.2 JPype1==1.4.1 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.19.1 jsonschema-specifications==2023.7.1 jupyter_client==8.4.0 jupyter_core==5.4.0 jupyterlab-widgets==3.0.9 kiwisolver==1.4.5 langchain==0.0.318 langsmith==0.0.47 lark==1.1.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.8.0 matplotlib-inline==0.1.6 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 nest-asyncio==1.5.8 networkx==3.2 ninja==1.11.1.1 nltk==3.8.1 numexpr==2.8.7 numpy==1.26.1 openai==0.28.1 openapi-schema-pydantic==1.2.4 openllm==0.3.9 openllm-client==0.3.9 openllm-core==0.3.9 openpyxl==3.1.2 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-instrumentation-grpc==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.13.2 orjson==3.9.9 packaging==23.2 pandas==2.1.1 parso==0.8.3 pathspec==0.11.2 pdfminer.six==20221105 pdfquery==0.4.3 peft==0.5.0 pickleshare==0.7.5 Pillow==10.1.0 pip-autoremove==0.10.0 pip-requirements-parser==32.0.1 pip-review==1.3.0 pip-tools==7.3.0 platformdirs==3.11.0 portalocker==2.8.2 prometheus-client==0.17.1 prompt-toolkit==3.0.39 protobuf==4.24.4 psutil==5.9.6 pure-eval==0.2.2 pyarrow==13.0.0 pycparser==2.21 pycryptodome==3.19.0 pydantic==2.4.2 pydantic_core==2.10.1 pydeck==0.8.0 Pygments==2.16.1 Pympler==1.0.1 PyMuPDF==1.23.5 pymupdf-fonts==1.0.5 PyMuPDFb==1.23.5 pynvml==11.5.0 pyparsing==3.1.1 pyproject_hooks==1.0.0 pyquery==2.0.0 pyreadline3==3.4.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 pytz-deprecation-shim==0.1.0.post0 pywin32==306 PyYAML==6.0.1 pyzmq==25.1.1 qdrant-client==1.6.3 referencing==0.30.2 regex==2023.10.3 requests==2.31.0 rich==13.6.0 roman==4.1 rouge==1.0.1 rpds-py==0.10.6 safetensors==0.4.0 schema==0.7.5 scikit-learn==1.3.1 scipy==1.11.3 sentence-transformers==2.2.2 sentencepiece==0.1.99 sigfig==1.3.3 simple-di==0.1.5 six==1.16.0 smmap==5.0.1 sniffio==1.3.0 sortedcontainers==2.4.0 spyder-kernels==2.4.4 SQLAlchemy==2.0.22 stack-data==0.6.3 starlette==0.31.1 streamlit==1.27.2 streamlit-chat==0.1.1 sympy==1.12 tabula-py==2.8.2 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 tiktoken==0.5.1 tokenizers==0.14.1 toml==0.10.2 toolz==0.12.0 torch==2.1.0 torchaudio==2.1.0 torchvision==0.16.0 tornado==6.3.3 tqdm==4.66.1 traitlets==5.11.2 transformers @ git+https://github.com/huggingface/transformers@43bfd093e1817c0333a1e10fcbdd54f1032baad0 typing-inspect==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 tzlocal==5.1 urllib3==1.26.18 uvicorn==0.23.2 validators==0.22.0 watchdog==3.0.0 watchfiles==0.21.0 wcwidth==0.2.8 widgetsnbextension==4.0.9 wrapt==1.15.0 xformers==0.0.22.post4 xlrd==2.0.1 xxhash==3.4.1 yarl==1.9.2 zipp==3.17.0
using python 3.11
Who can help?
No response
Information
Related Components
Reproduction
from dotenv import load_dotenv import os from langchain.chat_models import ChatOpenAI from qdrant_client import QdrantClient as qcqc from langchain.embeddings import HuggingFaceInstructEmbeddings from langchain.vectorstores import Qdrant from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo
load_dotenv() openai_key = os.getenv('OPENAI_API_KEY') db_path = os.getenv('vectordb_local_path') key = openai_key llm = ChatOpenAI( temperature = 0, model = 'gpt-3.5-turbo', streaming = True)
text_metadata = [AttributeInfo(name = 'book name', description = "name of the book.", type = "string"), AttributeInfo(name = 'author', description = 'Author of the book', type = 'string'), AttributeInfo(name = 'creation data', description = 'the date the book was written', type = 'list[int]'), AttributeInfo(name = 'page', description = "page number.", type = "int"), AttributeInfo(name = 'images', description = "dictionary whoes keys are name and description of images on the page,\ and whoes contents are image references on pdfs", type = "dict{string:string}"), AttributeInfo(name = 'tables', description = 'list of tables from the page', type = 'list[dataframe]') ]
def retreive_conversation_construct(store,store_content_description, metadata_format=text_metadata,verbose=False): ''' this is the first part of this function, and is the first problem i ran into ''' retriever = SelfQueryRetriever.from_llm(llm = llm, vectorstore=store, document_contents = store_content_description, metadata_field_info = metadata_format, enable_limit=True, fix_invalid = True, verbose=verbose) return retriever client = qcqc(path= db_path) model_name = "hkunlp/instructor-xl" model_kwargs = {'device': 'cuda'} encode_kwargs = {'normalize_embeddings': True} load_dotenv() path = os.getenv('instructor_local_dir') os.environ['CURL_CA_BUNDLE'] = '' embed_instruction ='Represent the document for retrieval: ' embeddings = HuggingFaceInstructEmbeddings( model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, cache_folder = path, embed_instruction = embed_instruction) vector_store = Qdrant(client= client, collection_name= 'my cluster', embeddings= embeddings) store_content_description = 'this is a paper about generating training data for large language models.' retreive_conversation_construct(vector_store,store_content_description)
Expected behavior
retriever should get generated.
I found in self_query.py, the .from_llm() method eventually leads to _get_builtin_translator getting called, which returns QdrantTranslator(metadata_key=vectorstore.metadata_payload_key) as structured_query_translator.
but later when calling structured_query_translator.allowed_operators from qdrant.py, the QdrantTranslator doesn't have allowed_operators, thus returns a None object.
this results in the following error:
File d:\ai_dev\research_assistant\testing.py:74 retreive_conversation_construct(vector_store,store_content_description)
File d:\ai_dev\research_assistant\testing.py:50 in retreive_conversation_construct retriever = SelfQueryRetriever.from_llm(llm = llm,
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\retrievers\self_query\base.py:214 in from_llm query_constructor = load_query_constructor_runnable(
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\base.py:317 in load_query_constructor_runnable prompt = get_query_constructor_prompt(
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\base.py:203 in get_query_constructor_prompt allowed_operators=" | ".join(allowed_operators),
TypeError: can only join an iterable