langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.42k stars 14.79k forks source link

qdrant.py doesn't contain any allowed_operators #12061

Closed Gilnore closed 7 months ago

Gilnore commented 11 months ago

System Info

accelerate==0.23.0 aiohttp==3.8.6 aiosignal==1.3.1 altair==5.1.2 annotated-types==0.6.0 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 asttokens==2.4.0 async-timeout==4.0.3 attrs==23.1.0 auto-gptq==0.4.2 backcall==0.2.0 bentoml==1.1.7 bitsandbytes==0.41.1 blinker==1.6.3 build==1.0.3 cachetools==5.3.1 cattrs==23.1.2 certifi==2023.7.22 cffi==1.16.0 chardet==5.2.0 charset-normalizer==3.3.0 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 cmake==3.27.7 colorama==0.4.6 coloredlogs==15.0.1 comm==0.1.4 contextlib2==21.6.0 contourpy==1.1.1 cryptography==41.0.4 cssselect==1.2.0 cuda-python==12.2.0 cycler==0.12.1 Cython==3.0.4 dataclasses-json==0.6.1 datasets==2.14.5 debugpy==1.8.0 decorator==5.1.1 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distro==1.8.0 et-xmlfile==1.1.0 executing==2.0.0 fastcore==1.5.29 filelock==3.12.4 filetype==1.2.0 fonttools==4.43.1 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.6.0 ghapi==1.0.4 gitdb==4.0.10 GitPython==3.1.40 greenlet==3.0.0 grpcio==1.59.0 grpcio-health-checking==1.59.0 grpcio-tools==1.59.0 h11==0.14.0 h2==4.1.0 hpack==4.0.0 httpcore==0.18.0 httpx==0.25.0 huggingface-hub==0.17.3 humanfriendly==10.0 hyperframe==6.0.1 idna==3.4 importlib-metadata==6.8.0 inflection==0.5.1 InstructorEmbedding==1.0.1 ipykernel==6.25.2 ipython==8.16.1 ipywidgets==8.1.1 jedi==0.19.1 Jinja2==3.1.2 joblib==1.3.2 JPype1==1.4.1 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.19.1 jsonschema-specifications==2023.7.1 jupyter_client==8.4.0 jupyter_core==5.4.0 jupyterlab-widgets==3.0.9 kiwisolver==1.4.5 langchain==0.0.318 langsmith==0.0.47 lark==1.1.7 lxml==4.9.3 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.8.0 matplotlib-inline==0.1.6 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 nest-asyncio==1.5.8 networkx==3.2 ninja==1.11.1.1 nltk==3.8.1 numexpr==2.8.7 numpy==1.26.1 openai==0.28.1 openapi-schema-pydantic==1.2.4 openllm==0.3.9 openllm-client==0.3.9 openllm-core==0.3.9 openpyxl==3.1.2 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-instrumentation-grpc==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.13.2 orjson==3.9.9 packaging==23.2 pandas==2.1.1 parso==0.8.3 pathspec==0.11.2 pdfminer.six==20221105 pdfquery==0.4.3 peft==0.5.0 pickleshare==0.7.5 Pillow==10.1.0 pip-autoremove==0.10.0 pip-requirements-parser==32.0.1 pip-review==1.3.0 pip-tools==7.3.0 platformdirs==3.11.0 portalocker==2.8.2 prometheus-client==0.17.1 prompt-toolkit==3.0.39 protobuf==4.24.4 psutil==5.9.6 pure-eval==0.2.2 pyarrow==13.0.0 pycparser==2.21 pycryptodome==3.19.0 pydantic==2.4.2 pydantic_core==2.10.1 pydeck==0.8.0 Pygments==2.16.1 Pympler==1.0.1 PyMuPDF==1.23.5 pymupdf-fonts==1.0.5 PyMuPDFb==1.23.5 pynvml==11.5.0 pyparsing==3.1.1 pyproject_hooks==1.0.0 pyquery==2.0.0 pyreadline3==3.4.1 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 pytz-deprecation-shim==0.1.0.post0 pywin32==306 PyYAML==6.0.1 pyzmq==25.1.1 qdrant-client==1.6.3 referencing==0.30.2 regex==2023.10.3 requests==2.31.0 rich==13.6.0 roman==4.1 rouge==1.0.1 rpds-py==0.10.6 safetensors==0.4.0 schema==0.7.5 scikit-learn==1.3.1 scipy==1.11.3 sentence-transformers==2.2.2 sentencepiece==0.1.99 sigfig==1.3.3 simple-di==0.1.5 six==1.16.0 smmap==5.0.1 sniffio==1.3.0 sortedcontainers==2.4.0 spyder-kernels==2.4.4 SQLAlchemy==2.0.22 stack-data==0.6.3 starlette==0.31.1 streamlit==1.27.2 streamlit-chat==0.1.1 sympy==1.12 tabula-py==2.8.2 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 tiktoken==0.5.1 tokenizers==0.14.1 toml==0.10.2 toolz==0.12.0 torch==2.1.0 torchaudio==2.1.0 torchvision==0.16.0 tornado==6.3.3 tqdm==4.66.1 traitlets==5.11.2 transformers @ git+https://github.com/huggingface/transformers@43bfd093e1817c0333a1e10fcbdd54f1032baad0 typing-inspect==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 tzlocal==5.1 urllib3==1.26.18 uvicorn==0.23.2 validators==0.22.0 watchdog==3.0.0 watchfiles==0.21.0 wcwidth==0.2.8 widgetsnbextension==4.0.9 wrapt==1.15.0 xformers==0.0.22.post4 xlrd==2.0.1 xxhash==3.4.1 yarl==1.9.2 zipp==3.17.0

using python 3.11

Who can help?

No response

Information

Related Components

Reproduction

from dotenv import load_dotenv import os from langchain.chat_models import ChatOpenAI from qdrant_client import QdrantClient as qcqc from langchain.embeddings import HuggingFaceInstructEmbeddings from langchain.vectorstores import Qdrant from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain.chains.query_constructor.base import AttributeInfo

load_dotenv() openai_key = os.getenv('OPENAI_API_KEY') db_path = os.getenv('vectordb_local_path') key = openai_key llm = ChatOpenAI( temperature = 0, model = 'gpt-3.5-turbo', streaming = True)

text_metadata = [AttributeInfo(name = 'book name', description = "name of the book.", type = "string"), AttributeInfo(name = 'author', description = 'Author of the book', type = 'string'), AttributeInfo(name = 'creation data', description = 'the date the book was written', type = 'list[int]'), AttributeInfo(name = 'page', description = "page number.", type = "int"), AttributeInfo(name = 'images', description = "dictionary whoes keys are name and description of images on the page,\ and whoes contents are image references on pdfs", type = "dict{string:string}"), AttributeInfo(name = 'tables', description = 'list of tables from the page', type = 'list[dataframe]') ]

def retreive_conversation_construct(store,store_content_description, metadata_format=text_metadata,verbose=False): ''' this is the first part of this function, and is the first problem i ran into ''' retriever = SelfQueryRetriever.from_llm(llm = llm, vectorstore=store, document_contents = store_content_description, metadata_field_info = metadata_format, enable_limit=True, fix_invalid = True, verbose=verbose) return retriever client = qcqc(path= db_path) model_name = "hkunlp/instructor-xl" model_kwargs = {'device': 'cuda'} encode_kwargs = {'normalize_embeddings': True} load_dotenv() path = os.getenv('instructor_local_dir') os.environ['CURL_CA_BUNDLE'] = '' embed_instruction ='Represent the document for retrieval: ' embeddings = HuggingFaceInstructEmbeddings( model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, cache_folder = path, embed_instruction = embed_instruction) vector_store = Qdrant(client= client, collection_name= 'my cluster', embeddings= embeddings) store_content_description = 'this is a paper about generating training data for large language models.' retreive_conversation_construct(vector_store,store_content_description)

Expected behavior

retriever should get generated.

I found in self_query.py, the .from_llm() method eventually leads to _get_builtin_translator getting called, which returns QdrantTranslator(metadata_key=vectorstore.metadata_payload_key) as structured_query_translator.

but later when calling structured_query_translator.allowed_operators from qdrant.py, the QdrantTranslator doesn't have allowed_operators, thus returns a None object.

this results in the following error:

File d:\ai_dev\research_assistant\testing.py:74 retreive_conversation_construct(vector_store,store_content_description)

File d:\ai_dev\research_assistant\testing.py:50 in retreive_conversation_construct retriever = SelfQueryRetriever.from_llm(llm = llm,

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\retrievers\self_query\base.py:214 in from_llm query_constructor = load_query_constructor_runnable(

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\base.py:317 in load_query_constructor_runnable prompt = get_query_constructor_prompt(

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain\chains\query_constructor\base.py:203 in get_query_constructor_prompt allowed_operators=" | ".join(allowed_operators),

TypeError: can only join an iterable

dosubot[bot] commented 11 months ago

🤖

Hello,

The error you're encountering is due to the allowed_operators attribute of the QdrantTranslator object being None. This is causing an issue when the from_llm method of the SelfQueryRetriever class tries to join this None value, leading to a TypeError: can only join an iterable.

In the context shared, the QdrantTranslator class does not have an allowed_operators attribute. Instead, it has an allowed_comparators attribute. This could be the source of the confusion and the error.

If allowed_operators is a required attribute for the QdrantTranslator object, you should add it to the class definition and initialize it as an iterable. Here's a possible modification:

class QdrantTranslator(Visitor):
    """Translate `Qdrant` internal query language elements to valid filters."""

    allowed_comparators = (
        Comparator.EQ,
        Comparator.LT,
        Comparator.LTE,
        Comparator.GT,
        Comparator.GTE,
    )
    """Subset of allowed logical comparators."""

    allowed_operators = []  # Add this line

    def __init__(self, metadata_key: str):
        self.metadata_key = metadata_key
        # ...

If allowed_operators is supposed to be None in some cases, you should modify the from_llm method of the SelfQueryRetriever class to handle this case. Here's a possible modification:

if "allowed_operators" not in chain_kwargs:
    allowed_operators = structured_query_translator.allowed_operators
    if allowed_operators is None:
        allowed_operators = []  # or some other default value
    chain_kwargs["allowed_operators"] = allowed_operators

This code checks if allowed_operators is None and, if it is, sets it to an empty list (or some other default value) before assigning it to chain_kwargs["allowed_operators"]. This ensures that chain_kwargs["allowed_operators"] is always an iterable, preventing the TypeError.

Please try these modifications and let me know if the issue persists.

Regarding the purpose of the allowed_operators attribute in the QdrantTranslator object within the LangChain codebase, I wasn't able to find an answer within the repository. It's possible that this attribute is used in some other part of the codebase that wasn't included in the context provided.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

xieqihui commented 10 months ago

I have submitted a PR #12328 to fix this bug.

dosubot[bot] commented 7 months ago

Hi, @Gilnore,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you reported a TypeError related to the allowed_operators attribute of the QdrantTranslator object being None. Dosubot provided a detailed explanation of the error and suggested modifications to the code. Subsequently, Xieqihui submitted a pull request (#12328) to fix the bug.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!