Error in StructuredQueryOutputParser using SelfQueryRetriever with Pinecone

darthShana commented 5 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.

Example Code

Im using a standard SelfQueryRetriver to extract relevant documents (car listings) that match a user query. Its has been working pretty well but recently it started giving me errors (stack trace attached). ise retriever = SelfQueryRetriever.from_llm( llm, vectordb, document_content_description, metadata_field_info, verbose=True )

Error Message and Stack Trace (if applicable)

OutputParserException('Parsing text\njson\n{\n "query": "with bluetooth and a reversing camera recent",\n "filter": "or(eq(\\"vehicle_type\\", \\"Hatchback\\"), eq(\\"vehicle_type\\", \\"Sedan\\")), in(\\"location\\", [\\"Westgate\\", \\"North Shore\\", \\"Otahuhu\\", \\"Penrose\\", \\"Botany\\", \\"Manukau\\"])"\n}\n\n raised following error:\nUnexpected token Token(\'COMMA\', \',\') at line 1, column 65.\nExpected one of: \n\t* $END\n')Traceback (most recent call last):

File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser_state.py", line 77, in feed_token action, arg = states[state][token.type]



KeyError: 'COMMA'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain/chains/query_constructor/base.py", line 56, in parse
    parsed["filter"] = self.ast_parse(parsed["filter"])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/lark.py", line 658, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parser_frontends.py", line 104, in parse
    return self.parser.parse(stream, chosen_start, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser.py", line 42, in parse
    return self.parser.parse(lexer, start)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser.py", line 88, in parse
    return self.parse_from_state(parser_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser.py", line 111, in parse_from_state
    raise e

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser.py", line 102, in parse_from_state
    state.feed_token(token)

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/lark/parsers/lalr_parser_state.py", line 80, in feed_token
    raise UnexpectedToken(token, expected, state=self, interactive_parser=None)

lark.exceptions.UnexpectedToken: Unexpected token Token('COMMA', ',') at line 1, column 65.
Expected one of: 
    * $END

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1246, in _call_with_config
    context.run(

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 326, in call_func_with_variable_args
    return func(input, **kwargs)  # type: ignore[call-arg]
           ^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 168, in <lambda>
    lambda inner_input: self.parse_result(
                        ^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain_core/output_parsers/base.py", line 219, in parse_result
    return self.parse(result[0].text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/dharshana/.local/share/virtualenvs/tina-virtual-assistant-eLldwkZS/lib/python3.11/site-packages/langchain/chains/query_constructor/base.py", line 63, in parse
    raise OutputParserException(

langchain_core.exceptions.OutputParserException: Parsing text
```json
{
    "query": "with bluetooth and a reversing camera recent",
    "filter": "or(eq(\"vehicle_type\", \"Hatchback\"), eq(\"vehicle_type\", \"Sedan\")), in(\"location\", [\"Westgate\", \"North Shore\", \"Otahuhu\", \"Penrose\", \"Botany\", \"Manukau\"])"
}
```
 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 65.
Expected one of: 
    * $END

### Description

im getting
lark.exceptions.UnexpectedToken: Unexpected token Token('COMMA', ',') at line 1, column 65.
Expected one of: 
    * $END

Seems its not happy with a COMMA.
Im not entirely sure if the cause of the error is change in the Pinecone query api or an update in langchain version

### System Info

        "langchain": {
            "hashes": [
                "sha256:29d95f12afe9690953820970205dba3b098ee1f7531e80eb18c1236d3feda921",
                "sha256:b40fbe2b65360afe6c0d5bbf37e79469f990779460640edde5b906175c49807e"
            ],
            "index": "pypi",
            "version": "==0.1.7"
        },
        "langchain-community": {
            "hashes": [
                "sha256:bd112b5813702919c50f89b1afa2b63adf1da89999df4842b327ee11220f8c39",
                "sha256:c56c48bc77d24e1fc399a9ee9a637d96e3b2ff952e3a080b5a41820d9d00fb3c"
            ],
            "index": "pypi",
            "version": "==0.0.20"
        },
        "langchain-core": {
            "hashes": [
                "sha256:34359cc8b6f8c3d45098c54a6a9b35c9f538ef58329cd943a2249d6d7b4e5806",
                "sha256:d42fac013c39a8b0bcd7e337a4cb6c17c16046c60d768f89df582ad73ec3c5cb"
            ],
            "markers": "python_full_version >= '3.8.1' and python_version < '4.0'",
            "version": "==0.1.23"
        },
        "langchain-openai": {
            "hashes": [
                "sha256:2ef040e4447a26a9d3bd45dfac9cefa00797ea58555a3d91ab4f88699eb3a005",
                "sha256:f5c4ebe46f2c8635c8f0c26cc8df27700aacafea025410e418d5a080039974dd"
            ],
            "index": "pypi",
            "version": "==0.0.6"
        },

dosubot[bot] commented 5 months ago

🤖

Hey @darthShana, great to see you back! Running into a bit of a snag, huh? Let's see what we can do about that.

From the error message you provided, it seems like the issue is with the filter query syntax. The error message indicates that the parser is not expecting a comma (',') at the position where it is found.

The filter query you're using is:

"or(eq(\"vehicle_type\", \"Hatchback\"), eq(\"vehicle_type\", \"Sedan\")), in(\"location\", [\"Westgate\", \"North Shore\", \"Otahuhu\", \"Penrose\", \"Botany\", \"Manukau\"])"

The error might be due to the misplaced comma (',') after the or function. The or function takes multiple arguments and combines them using the logical OR operator. The arguments should be separated by commas (',') and enclosed in parentheses ('()'). The in function should be one of the arguments of the or function, not a separate function call.

Here's how the filter query should look like:

"or(eq(\"vehicle_type\", \"Hatchback\"), eq(\"vehicle_type\", \"Sedan\"), in(\"location\", [\"Westgate\", \"North Shore\", \"Otahuhu\", \"Penrose\", \"Botany\", \"Manukau\"]))"

This filter query checks if the vehicle_type is either "Hatchback" or "Sedan" and if the location is in the list ["Westgate", "North Shore", "Otahuhu", "Penrose", "Botany", "Manukau"].

Please try updating your filter query and let me know if the issue persists. If it does, it would be helpful if you could provide more information about the query_constructor and structured_query_translator you're using with the SelfQueryRetriever. These components are responsible for generating and translating the filter query, so the issue might be with their implementation.

For more information about the expected syntax for the filter query, you can refer to the LangChain documentation and the source code of the SelfQueryRetriever class in the LangChain repository.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

darthShana commented 5 months ago

hi.. so two things

i don't agree with your logical assessment of the query if This filter query checks if the vehicle_type is either "Hatchback" or "Sedan" and if the location is in the list ["Westgate", "North Shore", "Otahuhu", "Penrose", "Botany", "Manukau"]. then the query should be and( or( eq(\"vehicle_type\", \"Hatchback\"), eq(\"vehicle_type\", \"Sedan\") ), in(\"location\", [\"Westgate\", \"North Shore\", \"Otahuhu\", \"Penrose\", \"Botany\", \"Manukau\"]) )" note () specifying the operators to which the or operator should apply
As im using a SelfQueryRetriver i didnt write this query. Should'nt the langchain prompts be asking the llm to construct the query in the correct syntax? btw im using model="gpt-4-turbo-preview"

colajoy commented 3 months ago

hi.. so two things

i don't agree with your logical assessment of the query if This filter query checks if the vehicle_type is either "Hatchback" or "Sedan" and if the location is in the list ["Westgate", "North Shore", "Otahuhu", "Penrose", "Botany", "Manukau"]. then the query should be and( or( eq("vehicle_type", "Hatchback"), eq("vehicle_type", "Sedan") ), in("location", ["Westgate", "North Shore", "Otahuhu", "Penrose", "Botany", "Manukau"]) )" note () specifying the operators to which the or operator should apply

As im using a SelfQueryRetriver i didnt write this query. Should'nt the langchain prompts be asking the llm to construct the query in the correct syntax? btw im using model="gpt-4-turbo-preview"

Hello brother, I have encountered the same problem. Have you resolved it

langchain-ai / langchain