Zledme commented 1 year ago

System Info

Langchain Version: 0.0.245 model: vicuna-13b-v1.5

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[X] Prompts / Prompt Templates / Prompt Selectors
[X] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

db = Chroma(persist_directory=persist_directory, embedding_function=embeddings, client_settings=CHROMA_SETTINGS)    
metadata_field_info = [
   AttributeInfo(
        name='lesson',
        description="Lesson Number of Book",
        type="integer",
    )
]

document_content_description = "English Books"
retriever = SelfQueryRetriever.from_llm(
        llm, db, document_content_description, metadata_field_info, verbose=True
    )
    # llm_chain.predict(context = context, question=question)
    qa = RetrievalQA.from_chain_type(llm=llm ,chain_type="stuff", retriever=retriever, return_source_documents=True, chain_type_kwargs={"prompt": PROMPT})

res = qa(query)

this is one of the documents

 File "/home/roger/Documents/GitHub/RGRgithub/roger/testllm/main.py", line 122, in qanda
    res = qa(query)
          ^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/base.py", line 258, in __call__
    raise e
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/base.py", line 252, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 130, in _call
    docs = self._get_docs(question, run_manager=_run_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/retrieval_qa/base.py", line 210, in _get_docs
    return self.retriever.get_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/schema/retriever.py", line 193, in get_relevant_documents
    raise e
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/schema/retriever.py", line 186, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/retrievers/self_query/base.py", line 100, in _get_relevant_documents
    self.llm_chain.predict_and_parse(
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/llm.py", line 282, in predict_and_parse
    return self.prompt.output_parser.parse(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/roger/Documents/GitHub/RGRgithub/roger/testgenv/lib/python3.11/site-packages/langchain/chains/query_constructor/base.py", line 52, in parse
    raise OutputParserException(
langchain.schema.output_parser.OutputParserException: Parsing text
```json
{
    "query": "phoneme",
    "filter": "eq(lesson, 1)"
}

 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 10.
Expected one of: 
    * LPAR
Previous tokens: [Token('CNAME', 'lesson')]

Expected behavior

Error caused while using SelfQueryRetriever with RetrieverQA.
showing only with some queries

I found that in langchain/chains/llm.py, line 282, in predict_and_parse

result = self.predict(callbacks=callbacks, **kwargs)
if self.prompt.output_parser is not None:
return self.prompt.output_parser.parse(result)
else:
return result

when result is, the error occurs(Q: what is a phoneme) '```json\n{\n "query": "phoneme",\n "filter": "eq(lesson, 1)"\n}\n```'
it doesn't occur when result is , (Q: What is a phoneme) '```json\n{\n "query": "phoneme",\n "filter": "eq(\\"lesson\\", 1)"\n}\n```'
I can't change the version right now.

dosubot[bot] commented 1 year ago

🤖

Hello,

Thank you for providing detailed information about the issue you're experiencing. It seems like the problem is related to the parsing of the result returned by the predict_and_parse method in the llm.py file.

The error message indicates that the parser is expecting a left parenthesis '(' but instead it encounters a comma ','. This is happening because the parser is trying to parse the string 'eq(lesson, 1)' as a function call with 'lesson' as the function name and '1' as the argument, which is not valid.

The difference between the two results you provided is that in the second one, the string 'lesson' is escaped with double quotes, making it a valid string argument for the 'eq' function. This is why the second result does not raise an error.

As for why this is happening, it's hard to say without more information. It could be a bug in the predict_and_parse method or in the OutputParser class, or it could be an issue with the specific language model you're using.

One thing you could try is to update your LangChain version to the latest one (v0.0.266) and see if the issue persists. There might have been fixes or improvements made in the newer versions that could resolve your issue.

If updating LangChain does not solve the problem, it would be helpful if you could provide more information about the specific queries that are causing the error, as well as the exact version of the language model you're using. This would help in identifying the root cause of the issue.

I hope this helps. Please let me know if you have any other questions or if there's anything else I can do to assist you.

Sources

libs/langchain/langchain/retrievers/self_query/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

WaleedAlfaris commented 1 year ago

Having the same issue with langchain v0.0.274 using model Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin. Funny thing is if I rerun the exact same query after it fails with this error, it somehow works.

ericfeunekes commented 10 months ago

I'm getting the same thing on langchain==0.0.327

OutputParserException: Parsing text
{
    "query": "legislative amendment",
    "filter": "eq(\"subsection\", \"12\") and eq(\"paragraph\", \"2.02\")"
}
raised following error:
Unexpected token Token('CNAME', 'and') at line 1, column 24.
Expected one of: 
    * $END
Previous tokens: [Token('RPAR', ')')]

Danouchka commented 9 months ago

Hi @Zledme , what was your Prompt Template ?

nikitacorp commented 8 months ago

@ericfeunekes @WaleedAlfaris Have you found solution to your problem?

TheNor01 commented 7 months ago

Using v0.0.352 Same error, i am using SelfQueryRetriever with ChromaTranslator

`langchain_core.exceptions.OutputParserException: Parsing text

{
    "query": "chicken",
    "filter": "eq(\"category\", \"food\") and contain(content, \"chicken\")"
}

raised following error: Unexpected token Token('CNAME', 'and') at line 1, column 24. Expected one of:

$END Previous tokens: [Token('RPAR', ')')]`

stolto-pirla commented 7 months ago

Hi, on Langchain 0.1.0 having nasty problems too with the SelfQueryRetriever. Essentially the generated query is not correct.

Examples below: Case 1: { "query": "depression", "filter": "and(gt(\"Year\", 2021), lt(\"Year\", 2023))", "limit": "NO_LIMIT" } Fails as "limit" should be integer, it is explicitely stated in the FewShotPromptTemplate: "Make sure the limit is always an int value. It is an optional parameter so leave it blank if it does not make sense.". However the generated query is wrong, see exception _raised following error: 1 validation error for StructuredQuery limit value is not a valid integer (type=typeerror.integer)

Case 2: { "query": "depression", "filter": "eq(\"Year\", 2023), eq(\"Month\", 12)", "limit": 3 } Fails as the "filter" syntax is not correct it should have an 'and' statement, see exception _raised following error: Unexpected token Token('COMMA', ',') at line 1, column 17. Expected one of:

$END_

The issue is the model unable to produce a correct query. Even worse, results are not repeatable, sometimes the returned query is correct. Using ChatGPT 4.0 as LLM.

Giselasnjota commented 6 months ago

I have the same issue, someone find a solution? I'm using ChatGpt 3.5. I try to change the prompt but don't work. I'm treating the common problems with specific replaces, but is not the ideal solution.

vafokroy commented 6 months ago

Same here, has anyone found a solution yet?

stolto-pirla commented 6 months ago

No change and the issue is that the model often fails to generate the correct query. So one would need to redo the prompt, although it is not certain it will ever be 100% reliable.

p-gonzo commented 6 months ago

Jumping in: same issue here:

Metadata field info:

metadata_field_info = [
    AttributeInfo(
        name="law_name",
        description="The name of the law or piece of legislation",
        type="string",
    ),
    AttributeInfo(
        name="alt_law_name",
        description="The name of the law or piece of legislation",
        type="string",
    ),
]

document_content_description = "The contents of a law or piece of legislation"

Retriever

retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)

Input

retriever.get_relevant_documents("What does the law SB2102 ential")

Output

{
    "query": "SB2102",
    "filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}

---------------------------------------------------------------------------
UnexpectedCharacters                      Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:665](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=664), in ContextualLexer.lex(self, lexer_state, parser_state)
    664         lexer = self.lexers[parser_state.position]
--> 665         yield lexer.next_token(lexer_state, parser_state)
    666 except EOFError:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:598](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=597), in BasicLexer.next_token(self, lex_state, parser_state)
    597         allowed = {"<END-OF-FILE>"}
--> 598     raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
    599                                allowed=allowed, token_history=lex_state.last_token and [lex_state.last_token],
    600                                state=parser_state, terminals_by_name=self.terminals_by_name)
    602 value, type_ = res

UnexpectedCharacters: No terminal matches 'o' in the current parser context, at line 1 col 26

eq("law_name", "SB2102") or eq("alt_law_name", "SB2102")
                         ^
Expected one of: 
    * RPAR
    * RSQB
    * COMMA

Previous tokens: Token('RPAR', ')')

During handling of the above exception, another exception occurred:

UnexpectedToken                           Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:56](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=55), in StructuredQueryOutputParser.parse(self, text)
     55 else:
---> 56     parsed["filter"] = self.ast_parse(parsed["filter"])
     57 if not parsed.get("limit"):

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py:658](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py#line=657), in Lark.parse(self, text, start, on_error)
    641 """Parse the given text, according to the options provided.
    642 
    643 Parameters:
   (...)
    656 
    657 """
--> 658 return self.parser.parse(text, start=start, on_error=on_error)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py:104](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py#line=103), in ParsingFrontend.parse(self, text, start, on_error)
    103 stream = self._make_lexer_thread(text)
--> 104 return self.parser.parse(stream, chosen_start, **kw)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:42](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=41), in LALR_Parser.parse(self, lexer, start, on_error)
     41 try:
---> 42     return self.parser.parse(lexer, start)
     43 except UnexpectedInput as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:88](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=87), in _Parser.parse(self, lexer, start, value_stack, state_stack, start_interactive)
     87     return InteractiveParser(self, parser_state, parser_state.lexer)
---> 88 return self.parse_from_state(parser_state)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:111](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=110), in _Parser.parse_from_state(self, state, last_token)
    110         pass
--> 111     raise e
    112 except Exception as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:100](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=99), in _Parser.parse_from_state(self, state, last_token)
     99 token = last_token
--> 100 for token in state.lexer.lex(state):
    101     assert token is not None

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:674](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=673), in ContextualLexer.lex(self, lexer_state, parser_state)
    673     token = self.root_lexer.next_token(lexer_state, parser_state)
--> 674     raise UnexpectedToken(token, e.allowed, state=parser_state, token_history=[last_token], terminals_by_name=self.root_lexer.terminals_by_name)
    675 except UnexpectedCharacters:

UnexpectedToken: Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of: 
    * $END
Previous tokens: [Token('RPAR', ')')]

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Cell In[8], line 1
----> 1 retriever.get_relevant_documents("What does the law SB2102 ential")

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:224](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=223), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    222 except Exception as e:
    223     run_manager.on_retriever_error(e)
--> 224     raise e
    225 else:
    226     run_manager.on_retriever_end(
    227         result,
    228     )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:217](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=216), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
    215 _kwargs = kwargs if self._expects_other_args else {}
    216 if self._new_arg_supported:
--> 217     result = self._get_relevant_documents(
    218         query, run_manager=run_manager, **_kwargs
    219     )
    220 else:
    221     result = self._get_relevant_documents(query, **_kwargs)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py#line=167), in SelfQueryRetriever._get_relevant_documents(self, query, run_manager)
    157 def _get_relevant_documents(
    158     self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    159 ) -> List[Document]:
    160     """Get documents relevant for a query.
    161 
    162     Args:
   (...)
    166         List of relevant documents
    167     """
--> 168     structured_query = self.query_constructor.invoke(
    169         {"query": query}, config={"callbacks": run_manager.get_child()}
    170     )
    171     if self.verbose:
    172         logger.info(f"Generated Query: {structured_query}")

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:2053](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=2052), in RunnableSequence.invoke(self, input, config)
   2051 try:
   2052     for i, step in enumerate(self.steps):
-> 2053         input = step.invoke(
   2054             input,
   2055             # mark each step as a child run
   2056             patch_config(
   2057                 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
   2058             ),
   2059         )
   2060 # finish the root run
   2061 except BaseException as e:

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:167](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=166), in BaseOutputParser.invoke(self, input, config)
    163 def invoke(
    164     self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
    165 ) -> T:
    166     if isinstance(input, BaseMessage):
--> 167         return self._call_with_config(
    168             lambda inner_input: self.parse_result(
    169                 [ChatGeneration(message=inner_input)]
    170             ),
    171             input,
    172             config,
    173             run_type="parser",
    174         )
    175     else:
    176         return self._call_with_config(
    177             lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    178             input,
    179             config,
    180             run_type="parser",
    181         )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:1246](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=1245), in Runnable._call_with_config(self, func, input, config, run_type, **kwargs)
   1242     context = copy_context()
   1243     context.run(var_child_runnable_config.set, child_config)
   1244     output = cast(
   1245         Output,
-> 1246         context.run(
   1247             call_func_with_variable_args,
   1248             func,  # type: ignore[arg-type]
   1249             input,  # type: ignore[arg-type]
   1250             config,
   1251             run_manager,
   1252             **kwargs,
   1253         ),
   1254     )
   1255 except BaseException as e:
   1256     run_manager.on_chain_error(e)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py:326](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py#line=325), in call_func_with_variable_args(func, input, config, run_manager, **kwargs)
    324 if run_manager is not None and accepts_run_manager(func):
    325     kwargs["run_manager"] = run_manager
--> 326 return func(input, **kwargs)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=167), in BaseOutputParser.invoke.<locals>.<lambda>(inner_input)
    163 def invoke(
    164     self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
    165 ) -> T:
    166     if isinstance(input, BaseMessage):
    167         return self._call_with_config(
--> 168             lambda inner_input: self.parse_result(
    169                 [ChatGeneration(message=inner_input)]
    170             ),
    171             input,
    172             config,
    173             run_type="parser",
    174         )
    175     else:
    176         return self._call_with_config(
    177             lambda inner_input: self.parse_result([Generation(text=inner_input)]),
    178             input,
    179             config,
    180             run_type="parser",
    181         )

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:219](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=218), in BaseOutputParser.parse_result(self, result, partial)
    206 def parse_result(self, result: List[Generation], *, partial: bool = False) -> T:
    207     """Parse a list of candidate model Generations into a specific format.
    208 
    209     The return value is parsed from only the first Generation in the result, which
   (...)
    217         Structured output.
    218     """
--> 219     return self.parse(result[0].text)

File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:63](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=62), in StructuredQueryOutputParser.parse(self, text)
     59     return StructuredQuery(
     60         **{k: v for k, v in parsed.items() if k in allowed_keys}
     61     )
     62 except Exception as e:
---> 63     raise OutputParserException(
     64         f"Parsing text\n{text}\n raised following error:\n{e}"
     65     )

OutputParserException: Parsing text

{
    "query": "SB2102",
    "filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}

 raised following error:
Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of: 
    * $END
Previous tokens: [Token('RPAR', ')')]

Any ideas? Thanks!

wei-m-teh commented 6 months ago

Is there any update on this issue? I ran into the same issue with langchain version 0.1.7.

marouahamdi commented 6 months ago

I have got the same issue. any idea or help plz?

Giselasnjota commented 6 months ago

Hi,

Sure, Pinecone documentation say to treat date matadatas as unix epoch, so for me work this way:

My metadata is a float, so I define my AtributeInfo as:

AttributeInfo( name="publication_date_as_float", description="Publication date is the date the content was sent to the client.", type="float" )
I convert the datetime and upload to pinecone:

metadata["publication_date_as_float"] = time.mktime( date_temp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None).timetuple())

I'm using self_query in pinecone so I need to change the function visit_comparison in pinecone.py in the folder langchain/retrievers/self_query/pinecone.py:

Obs.: My reference for this was myscale.py in the same folder in langchain packages, they made the same thing.

def visit_comparison(self, comparison: Comparison) -> Dict:
    if comparison.comparator in (Comparator.IN, Comparator.NIN) and not

isinstance( comparison.value, list ): comparison.value = [comparison.value]

convert datetime to unix epoch, in my case i want the date without time

    if type(comparison.value) is Dict and ("type" in comparison.value)

and comparison.value["type"] == "date": data_formato = "%Y-%m-%d" date = datetime.strptime(comparison.value['date'], data_formato) comparison.value = int(time.mktime(date.timetuple()))

    return {
        comparison.attribute: {
            self._format_func(comparison.comparator): comparison.value
        }
    }

Let me know if works for you.

On Thu, Feb 22, 2024 at 2:07 PM marouahamdi @.***> wrote:

I have got the same issue. any idea or help plz?

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/9368#issuecomment-1959888448, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA . You are receiving this because you commented.Message ID: @.***>

--

"Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele, eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se você recebeu esta mensagem por engano, favor eliminá-la imediatamente."

"This message is reserved and its disclosure, distribution, reproduction or any other form of use is prohibited and shall depend upon previous proper authorization. The sender uses the electronic mail in the exercise of his/her work or by virtue thereof, and JOTA accepts no liability for its undue use. If you have received this e-mail by mistake, please delete it immediately."

VladMstv commented 5 months ago

Same issue with "and" instead of comma produced by llm langchain 0.1.9 langchain-community 0.0.24 langchain-core 0.1.27

stolto-pirla commented 5 months ago

Hi, Sure, Pinecone documentation say to treat date matadatas as unix epoch, so for me work this way: 1. My metadata is a float, so I define my AtributeInfo as: AttributeInfo( name="publication_date_as_float", description="Publication date is the date the content was sent to the client.", type="float" ) 2. I convert the datetime and upload to pinecone: metadata["publication_date_as_float"] = time.mktime( date_temp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None).timetuple()) 3. I'm using self_query in pinecone so I need to change the function visit_comparison in pinecone.py in the folder langchain/retrievers/self_query/pinecone.py: Obs.: My reference for this was myscale.py in the same folder in langchain packages, they made the same thing. def visit_comparison(self, comparison: Comparison) -> Dict: if comparison.comparator in (Comparator.IN, Comparator.NIN) and not isinstance( comparison.value, list ): comparison.value = [comparison.value] # convert datetime to unix epoch, in my case i want the date without time if type(comparison.value) is Dict and ("type" in comparison.value) and comparison.value["type"] == "date": data_formato = "%Y-%m-%d" date = datetime.strptime(comparison.value['date'], data_formato) comparison.value = int(time.mktime(date.timetuple())) return { comparison.attribute: { self._format_func(comparison.comparator): comparison.value } } Let me know if works for you. … On Thu, Feb 22, 2024 at 2:07 PM marouahamdi @.> wrote: I have got the same issue. any idea or help plz? — Reply to this email directly, view it on GitHub <#9368 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA . You are receiving this because you commented.Message ID: @.> -- "Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele, eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se você recebeu esta mensagem por engano, favor eliminá-la imediatamente." "This message is reserved and its disclosure, distribution, reproduction or any other form of use is prohibited and shall depend upon previous proper authorization. The sender uses the electronic mail in the exercise of his/her work or by virtue thereof, and JOTA accepts no liability for its undue use. If you have received this e-mail by mistake, please delete it immediately."

This is very interesting. However, it doesn't address the issue, which is well before the query goes to Pinecone, or any other database. The LLM is unable to generate a correct query, either because the logical condition syntax is wrong, or because the "limit" attribute is not used correctly. I wonder if anyone tried with a different LLM, ChatGPT 3.5 or 4 shows the issue.

fightingmonk commented 5 months ago

I am seeing similar behavior when using gpt-3.5-turbo-0125 - compound boolean filters are not correctly generated by the llm. for example, the llm is returning "filter": "eq(\"type\", \"ticket\") and eq(\"ticket_type\", \"Epic\")" where it should be returning "filter": "and( eq(\"type\", \"engineering_ticket\"), eq(\"ticket_type\", \"Epic\") )".

This is in direct contradiction to the prompt, which explains how to structure boolean filters. 😢

For others finding this issue, my use cases work when I use gpt-4-0613.

Ahmer967 commented 4 months ago

I am currently using the gpt-4 model, and it is performing exceptionally well. To achieve better results, I ensure that the keywords used in the prompts exactly match those in the data.

client = ChatOpenAI(model="gpt-4",api_key=openai_api_,temperature=0)
retriever_votes = SelfQueryRetriever.from_llm(
    client,
    db_votes,
    votes_content_description,
    metadata_field_votes,
    enable_limit=True,
)

Initially, the following prompt did not retrieve the expected results from the database:

Input

retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is failed")

Therefore, I adjusted the prompt to use the exact keywords as they appear in the data, including the same capitalization used in my Document. This modification improved document retrieval.

retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is Failed")

Output Sample


[Document(page_content='page_content', metadata={ 'day': 13, 'result': 'Failed','year': 2018})]

kanzyai-emirarditi commented 4 months ago

GPT-4 works well but all the other models I have tried, (command-r-plus, sonnet, 3.5) is giving the exception `{ "query": "Apple stock performance", "filter": "eq(ticker, \"AAPL\") and eq(year, 2023)" }


 raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 10.
Expected one of: 
    * LPAR
Previous tokens: [Token('CNAME', 'ticker')]`

Is there an update on this issue?

ravikumarmittal commented 3 months ago

This issue is fixed with some simple setting with chat gpt 3.5 turbo model

please change the langchain_pg_collection table column data type json to jsonb (cmetadata jsonb,)
change the column data type for table langchain_pg_embedding to jsonb like (cmetadata jsonb,)
3 in the below code one important trick even if i am using PGVector data base but translation is working with chromaTranslation or pinecode but not with PGvector translator in your case please change the line and test.

structured_query_translator=ChromaTranslator(),

structured_query_translator=PGVectorTranslator(),

structured_query_translator=PineconeTranslator()

#use below code this is working

from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings

from langchain.vectorstores.pgvector import PGVector

from langchain_community.vectorstores import PGVector from langchain.retrievers.self_query.pgvector import PGVectorTranslator from langchain.retrievers.self_query.chroma import ChromaTranslator from langchain.retrievers.self_query.pinecone import PineconeTranslator import os

os.environ["OPENAI_API_KEY"]='open ai key' connection = "db connecton string"

collection_name = "my_docs" embeddings = OpenAIEmbeddings()

docs = [ Document( page_content="A bunch of scientists bring back c and mayhem breaks loose", metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"}, ), Document( page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2}, ), Document( page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6}, ), Document( page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3}, ), Document( page_content="Toys come alive and have a blast doing so", metadata={"year": 1995, "genre": "animated"}, ), Document( page_content="Three men walk into the Zone, three men walk out of the Zone", metadata={ "year": 1979, "director": "Andrei Tarkovsky", "genre": "thriller", "rating": 9.9, }, ), ] embeddings = OpenAIEmbeddings() vectorstore = PGVector.from_documents( embedding=embeddings, documents=docs, collection_name=collection_name, connection_string=connection, use_jsonb=True,

)

from langchain.chains.query_constructor.base import AttributeInfo from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain_openai import ChatOpenAI,OpenAI

metadata_field_info = [ AttributeInfo( name="genre", description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']", type="string", ), AttributeInfo( name="year", description="This is related year of the movie", type="integer", ), AttributeInfo( name="director", description="The name of the movie director", type="string", ), AttributeInfo( name="rating", description="A 1-10 rating for the movie", type="float" ), ] document_content_description = "Brief summary of a movie" llm = ChatOpenAI(temperature=0) retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, structured_query_translator=ChromaTranslator(),

structured_query_translator=PGVectorTranslator(),

#structured_query_translator=PineconeTranslator()

)

This example only specifies a filter

result=retriever.invoke("give me all movie related to dream having rating 8.6 or rating>9 ") print(result) print('#####################')

orah1998 commented 3 months ago

anyone was able to solve this issue? i have the same myself: Unexpected token Token('COMMA', ',') at line 1, column 8. when using mistral-small ill be really glad if someone could help me solve this

santosh-gkg commented 2 months ago

I was also facing the same issue, and though i have not exactly found the problem but the error depends upon the model you are using using a bigger model or we can say better model solves it, i was using 'Gemma-7b-it' which was giving error but using 'Llama-70b-8192' solved it

bczaplicki-gd commented 1 month ago

In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.

You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.

parsa-abbasi commented 1 week ago

In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.

You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.

This was really helpful! In my case, I was using the Cohere API and ran into a similar issue where the filter I was getting looked like Released_Year, 2016, but the Released_Year needed to be in quotes. Adding the note about the string field needing to be in quotes in the AttributeInfo description fixed the problem for me.

metadata_field_info = [
    AttributeInfo(
        name = "Released_Year",
        description = "The year the movie was released. You must put Released_Year in quotes like \"Released_Year\"",
        type = "integer",
    ),
    AttributeInfo(
        name = "IMDB_Rating",
        description = "A 1-10 rating for the movie. You must put IMDB_Rating in quotes like \"IMDB_Rating\"",
        type = "float"
    ),
]

langchain-ai / langchain

SelfQueryRetriever gives error for some queries #9368

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

Sources

Metadata field info:

Retriever

Input

Output

convert datetime to unix epoch, in my case i want the date without time

structured_query_translator=ChromaTranslator(),

structured_query_translator=PGVectorTranslator(),

structured_query_translator=PineconeTranslator()

from langchain.vectorstores.pgvector import PGVector

structured_query_translator=PGVectorTranslator(),

This example only specifies a filter