Open Zledme opened 1 year ago
🤖
Hello,
Thank you for providing detailed information about the issue you're experiencing. It seems like the problem is related to the parsing of the result returned by the predict_and_parse
method in the llm.py
file.
The error message indicates that the parser is expecting a left parenthesis '(' but instead it encounters a comma ','. This is happening because the parser is trying to parse the string 'eq(lesson, 1)'
as a function call with 'lesson' as the function name and '1' as the argument, which is not valid.
The difference between the two results you provided is that in the second one, the string 'lesson' is escaped with double quotes, making it a valid string argument for the 'eq' function. This is why the second result does not raise an error.
As for why this is happening, it's hard to say without more information. It could be a bug in the predict_and_parse
method or in the OutputParser
class, or it could be an issue with the specific language model you're using.
One thing you could try is to update your LangChain version to the latest one (v0.0.266) and see if the issue persists. There might have been fixes or improvements made in the newer versions that could resolve your issue.
If updating LangChain does not solve the problem, it would be helpful if you could provide more information about the specific queries that are causing the error, as well as the exact version of the language model you're using. This would help in identifying the root cause of the issue.
I hope this helps. Please let me know if you have any other questions or if there's anything else I can do to assist you.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Having the same issue with langchain v0.0.274 using model Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_0.bin. Funny thing is if I rerun the exact same query after it fails with this error, it somehow works.
I'm getting the same thing on langchain==0.0.327
OutputParserException: Parsing text
{
"query": "legislative amendment",
"filter": "eq(\"subsection\", \"12\") and eq(\"paragraph\", \"2.02\")"
}
raised following error:
Unexpected token Token('CNAME', 'and') at line 1, column 24.
Expected one of:
* $END
Previous tokens: [Token('RPAR', ')')]
Hi @Zledme , what was your Prompt Template ?
@ericfeunekes @WaleedAlfaris Have you found solution to your problem?
Using v0.0.352 Same error, i am using SelfQueryRetriever with ChromaTranslator
`langchain_core.exceptions.OutputParserException: Parsing text
{
"query": "chicken",
"filter": "eq(\"category\", \"food\") and contain(content, \"chicken\")"
}
raised following error: Unexpected token Token('CNAME', 'and') at line 1, column 24. Expected one of:
Hi, on Langchain 0.1.0 having nasty problems too with the SelfQueryRetriever. Essentially the generated query is not correct.
Examples below:
Case 1:
{
"query": "depression",
"filter": "and(gt(\"Year\", 2021), lt(\"Year\", 2023))",
"limit": "NO_LIMIT"
}
Fails as "limit" should be integer, it is explicitely stated in the FewShotPromptTemplate: "Make sure the limit
is always an int value. It is an optional parameter so leave it blank if it does not make sense.". However the generated query is wrong, see exception
_raised following error:
1 validation error for StructuredQuery
limit
value is not a valid integer (type=typeerror.integer)
Case 2: { "query": "depression", "filter": "eq(\"Year\", 2023), eq(\"Month\", 12)", "limit": 3 } Fails as the "filter" syntax is not correct it should have an 'and' statement, see exception _raised following error: Unexpected token Token('COMMA', ',') at line 1, column 17. Expected one of:
The issue is the model unable to produce a correct query. Even worse, results are not repeatable, sometimes the returned query is correct. Using ChatGPT 4.0 as LLM.
I have the same issue, someone find a solution? I'm using ChatGpt 3.5. I try to change the prompt but don't work. I'm treating the common problems with specific replaces, but is not the ideal solution.
Same here, has anyone found a solution yet?
No change and the issue is that the model often fails to generate the correct query. So one would need to redo the prompt, although it is not certain it will ever be 100% reliable.
Jumping in: same issue here:
metadata_field_info = [
AttributeInfo(
name="law_name",
description="The name of the law or piece of legislation",
type="string",
),
AttributeInfo(
name="alt_law_name",
description="The name of the law or piece of legislation",
type="string",
),
]
document_content_description = "The contents of a law or piece of legislation"
retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)
retriever.get_relevant_documents("What does the law SB2102 ential")
{
"query": "SB2102",
"filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
---------------------------------------------------------------------------
UnexpectedCharacters Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:665](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=664), in ContextualLexer.lex(self, lexer_state, parser_state)
664 lexer = self.lexers[parser_state.position]
--> 665 yield lexer.next_token(lexer_state, parser_state)
666 except EOFError:
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:598](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=597), in BasicLexer.next_token(self, lex_state, parser_state)
597 allowed = {"<END-OF-FILE>"}
--> 598 raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
599 allowed=allowed, token_history=lex_state.last_token and [lex_state.last_token],
600 state=parser_state, terminals_by_name=self.terminals_by_name)
602 value, type_ = res
UnexpectedCharacters: No terminal matches 'o' in the current parser context, at line 1 col 26
eq("law_name", "SB2102") or eq("alt_law_name", "SB2102")
^
Expected one of:
* RPAR
* RSQB
* COMMA
Previous tokens: Token('RPAR', ')')
During handling of the above exception, another exception occurred:
UnexpectedToken Traceback (most recent call last)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:56](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=55), in StructuredQueryOutputParser.parse(self, text)
55 else:
---> 56 parsed["filter"] = self.ast_parse(parsed["filter"])
57 if not parsed.get("limit"):
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py:658](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lark.py#line=657), in Lark.parse(self, text, start, on_error)
641 """Parse the given text, according to the options provided.
642
643 Parameters:
(...)
656
657 """
--> 658 return self.parser.parse(text, start=start, on_error=on_error)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py:104](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parser_frontends.py#line=103), in ParsingFrontend.parse(self, text, start, on_error)
103 stream = self._make_lexer_thread(text)
--> 104 return self.parser.parse(stream, chosen_start, **kw)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:42](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=41), in LALR_Parser.parse(self, lexer, start, on_error)
41 try:
---> 42 return self.parser.parse(lexer, start)
43 except UnexpectedInput as e:
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:88](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=87), in _Parser.parse(self, lexer, start, value_stack, state_stack, start_interactive)
87 return InteractiveParser(self, parser_state, parser_state.lexer)
---> 88 return self.parse_from_state(parser_state)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:111](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=110), in _Parser.parse_from_state(self, state, last_token)
110 pass
--> 111 raise e
112 except Exception as e:
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py:100](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/parsers/lalr_parser.py#line=99), in _Parser.parse_from_state(self, state, last_token)
99 token = last_token
--> 100 for token in state.lexer.lex(state):
101 assert token is not None
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py:674](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/lark/lexer.py#line=673), in ContextualLexer.lex(self, lexer_state, parser_state)
673 token = self.root_lexer.next_token(lexer_state, parser_state)
--> 674 raise UnexpectedToken(token, e.allowed, state=parser_state, token_history=[last_token], terminals_by_name=self.root_lexer.terminals_by_name)
675 except UnexpectedCharacters:
UnexpectedToken: Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of:
* $END
Previous tokens: [Token('RPAR', ')')]
During handling of the above exception, another exception occurred:
OutputParserException Traceback (most recent call last)
Cell In[8], line 1
----> 1 retriever.get_relevant_documents("What does the law SB2102 ential")
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:224](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=223), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
222 except Exception as e:
223 run_manager.on_retriever_error(e)
--> 224 raise e
225 else:
226 run_manager.on_retriever_end(
227 result,
228 )
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py:217](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/retrievers.py#line=216), in BaseRetriever.get_relevant_documents(self, query, callbacks, tags, metadata, run_name, **kwargs)
215 _kwargs = kwargs if self._expects_other_args else {}
216 if self._new_arg_supported:
--> 217 result = self._get_relevant_documents(
218 query, run_manager=run_manager, **_kwargs
219 )
220 else:
221 result = self._get_relevant_documents(query, **_kwargs)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/retrievers/self_query/base.py#line=167), in SelfQueryRetriever._get_relevant_documents(self, query, run_manager)
157 def _get_relevant_documents(
158 self, query: str, *, run_manager: CallbackManagerForRetrieverRun
159 ) -> List[Document]:
160 """Get documents relevant for a query.
161
162 Args:
(...)
166 List of relevant documents
167 """
--> 168 structured_query = self.query_constructor.invoke(
169 {"query": query}, config={"callbacks": run_manager.get_child()}
170 )
171 if self.verbose:
172 logger.info(f"Generated Query: {structured_query}")
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:2053](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=2052), in RunnableSequence.invoke(self, input, config)
2051 try:
2052 for i, step in enumerate(self.steps):
-> 2053 input = step.invoke(
2054 input,
2055 # mark each step as a child run
2056 patch_config(
2057 config, callbacks=run_manager.get_child(f"seq:step:{i+1}")
2058 ),
2059 )
2060 # finish the root run
2061 except BaseException as e:
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:167](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=166), in BaseOutputParser.invoke(self, input, config)
163 def invoke(
164 self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
165 ) -> T:
166 if isinstance(input, BaseMessage):
--> 167 return self._call_with_config(
168 lambda inner_input: self.parse_result(
169 [ChatGeneration(message=inner_input)]
170 ),
171 input,
172 config,
173 run_type="parser",
174 )
175 else:
176 return self._call_with_config(
177 lambda inner_input: self.parse_result([Generation(text=inner_input)]),
178 input,
179 config,
180 run_type="parser",
181 )
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py:1246](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/base.py#line=1245), in Runnable._call_with_config(self, func, input, config, run_type, **kwargs)
1242 context = copy_context()
1243 context.run(var_child_runnable_config.set, child_config)
1244 output = cast(
1245 Output,
-> 1246 context.run(
1247 call_func_with_variable_args,
1248 func, # type: ignore[arg-type]
1249 input, # type: ignore[arg-type]
1250 config,
1251 run_manager,
1252 **kwargs,
1253 ),
1254 )
1255 except BaseException as e:
1256 run_manager.on_chain_error(e)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py:326](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/runnables/config.py#line=325), in call_func_with_variable_args(func, input, config, run_manager, **kwargs)
324 if run_manager is not None and accepts_run_manager(func):
325 kwargs["run_manager"] = run_manager
--> 326 return func(input, **kwargs)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:168](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=167), in BaseOutputParser.invoke.<locals>.<lambda>(inner_input)
163 def invoke(
164 self, input: Union[str, BaseMessage], config: Optional[RunnableConfig] = None
165 ) -> T:
166 if isinstance(input, BaseMessage):
167 return self._call_with_config(
--> 168 lambda inner_input: self.parse_result(
169 [ChatGeneration(message=inner_input)]
170 ),
171 input,
172 config,
173 run_type="parser",
174 )
175 else:
176 return self._call_with_config(
177 lambda inner_input: self.parse_result([Generation(text=inner_input)]),
178 input,
179 config,
180 run_type="parser",
181 )
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py:219](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain_core/output_parsers/base.py#line=218), in BaseOutputParser.parse_result(self, result, partial)
206 def parse_result(self, result: List[Generation], *, partial: bool = False) -> T:
207 """Parse a list of candidate model Generations into a specific format.
208
209 The return value is parsed from only the first Generation in the result, which
(...)
217 Structured output.
218 """
--> 219 return self.parse(result[0].text)
File [~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py:63](http://localhost:8888/doc/tree/TX_SB/~/.local/share/virtualenvs/LLM_RAG-sfbpCevB/lib/python3.9/site-packages/langchain/chains/query_constructor/base.py#line=62), in StructuredQueryOutputParser.parse(self, text)
59 return StructuredQuery(
60 **{k: v for k, v in parsed.items() if k in allowed_keys}
61 )
62 except Exception as e:
---> 63 raise OutputParserException(
64 f"Parsing text\n{text}\n raised following error:\n{e}"
65 )
OutputParserException: Parsing text
{
"query": "SB2102",
"filter": "eq(\"law_name\", \"SB2102\") or eq(\"alt_law_name\", \"SB2102\")"
}
raised following error:
Unexpected token Token('CNAME', 'or') at line 1, column 26.
Expected one of:
* $END
Previous tokens: [Token('RPAR', ')')]
Any ideas? Thanks!
Is there any update on this issue? I ran into the same issue with langchain version 0.1.7.
I have got the same issue. any idea or help plz?
Hi,
Sure, Pinecone documentation say to treat date matadatas as unix epoch, so for me work this way:
My metadata is a float, so I define my AtributeInfo as:
AttributeInfo( name="publication_date_as_float", description="Publication date is the date the content was sent to the client.", type="float" )
I convert the datetime and upload to pinecone:
metadata["publication_date_as_float"] = time.mktime( date_temp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None).timetuple())
Obs.: My reference for this was myscale.py in the same folder in langchain packages, they made the same thing.
def visit_comparison(self, comparison: Comparison) -> Dict:
if comparison.comparator in (Comparator.IN, Comparator.NIN) and not
isinstance( comparison.value, list ): comparison.value = [comparison.value]
if type(comparison.value) is Dict and ("type" in comparison.value)
and comparison.value["type"] == "date": data_formato = "%Y-%m-%d" date = datetime.strptime(comparison.value['date'], data_formato) comparison.value = int(time.mktime(date.timetuple()))
return {
comparison.attribute: {
self._format_func(comparison.comparator): comparison.value
}
}
Let me know if works for you.
On Thu, Feb 22, 2024 at 2:07 PM marouahamdi @.***> wrote:
I have got the same issue. any idea or help plz?
— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/9368#issuecomment-1959888448, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA . You are receiving this because you commented.Message ID: @.***>
--
"Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele, eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se você recebeu esta mensagem por engano, favor eliminá-la imediatamente."
"This message is reserved and its disclosure, distribution, reproduction or any other form of use is prohibited and shall depend upon previous proper authorization. The sender uses the electronic mail in the exercise of his/her work or by virtue thereof, and JOTA accepts no liability for its undue use. If you have received this e-mail by mistake, please delete it immediately."
Same issue with "and" instead of comma produced by llm langchain 0.1.9 langchain-community 0.0.24 langchain-core 0.1.27
Hi, Sure, Pinecone documentation say to treat date matadatas as unix epoch, so for me work this way: 1. My metadata is a float, so I define my AtributeInfo as: AttributeInfo( name="publication_date_as_float", description="Publication date is the date the content was sent to the client.", type="float" ) 2. I convert the datetime and upload to pinecone: metadata["publication_date_as_float"] = time.mktime( date_temp.replace(hour=0, minute=0, second=0, microsecond=0, tzinfo=None).timetuple()) 3. I'm using self_query in pinecone so I need to change the function visit_comparison in pinecone.py in the folder langchain/retrievers/self_query/pinecone.py: Obs.: My reference for this was myscale.py in the same folder in langchain packages, they made the same thing. def visit_comparison(self, comparison: Comparison) -> Dict: if comparison.comparator in (Comparator.IN, Comparator.NIN) and not isinstance( comparison.value, list ): comparison.value = [comparison.value] # convert datetime to unix epoch, in my case i want the date without time if type(comparison.value) is Dict and ("type" in comparison.value) and comparison.value["type"] == "date": data_formato = "%Y-%m-%d" date = datetime.strptime(comparison.value['date'], data_formato) comparison.value = int(time.mktime(date.timetuple())) return { comparison.attribute: { self._format_func(comparison.comparator): comparison.value } } Let me know if works for you. … On Thu, Feb 22, 2024 at 2:07 PM marouahamdi @.> wrote: I have got the same issue. any idea or help plz? — Reply to this email directly, view it on GitHub <#9368 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7DMHS2XBZRAG6XYTOFZA6TYU53NRAVCNFSM6AAAAAA3TT44ZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZHA4DQNBUHA . You are receiving this because you commented.Message ID: @.> -- "Esta mensagem é reservada e sua divulgação, distribuição, reprodução ou qualquer forma de uso é proibida e depende de prévia autorização do JOTA. O remetente utiliza o e-mail no exercício do seu trabalho ou em razão dele, eximindo o JOTA de qualquer responsabilidade por utilização indevida. Se você recebeu esta mensagem por engano, favor eliminá-la imediatamente." "This message is reserved and its disclosure, distribution, reproduction or any other form of use is prohibited and shall depend upon previous proper authorization. The sender uses the electronic mail in the exercise of his/her work or by virtue thereof, and JOTA accepts no liability for its undue use. If you have received this e-mail by mistake, please delete it immediately."
This is very interesting. However, it doesn't address the issue, which is well before the query goes to Pinecone, or any other database. The LLM is unable to generate a correct query, either because the logical condition syntax is wrong, or because the "limit" attribute is not used correctly. I wonder if anyone tried with a different LLM, ChatGPT 3.5 or 4 shows the issue.
I am seeing similar behavior when using gpt-3.5-turbo-0125
- compound boolean filters are not correctly generated by the llm. for example, the llm is returning "filter": "eq(\"type\", \"ticket\") and eq(\"ticket_type\", \"Epic\")"
where it should be returning "filter": "and( eq(\"type\", \"engineering_ticket\"), eq(\"ticket_type\", \"Epic\") )"
.
This is in direct contradiction to the prompt, which explains how to structure boolean filters. 😢
For others finding this issue, my use cases work when I use gpt-4-0613
.
I am currently using the gpt-4
model, and it is performing exceptionally well. To achieve better results, I ensure that the keywords used in the prompts exactly match those in the data.
client = ChatOpenAI(model="gpt-4",api_key=openai_api_,temperature=0)
retriever_votes = SelfQueryRetriever.from_llm(
client,
db_votes,
votes_content_description,
metadata_field_votes,
enable_limit=True,
)
Initially, the following prompt did not retrieve the expected results from the database:
Input
retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is failed")
Therefore, I adjusted the prompt to use the exact keywords as they appear in the data, including the same capitalization used in my Document
. This modification improved document retrieval.
retriever_votes.get_relevant_documents("Give me details of votes where year is 2018 and bill's result is Failed")
Output Sample
[Document(page_content='page_content', metadata={ 'day': 13, 'result': 'Failed','year': 2018})]
GPT-4 works well but all the other models I have tried, (command-r-plus, sonnet, 3.5) is giving the exception `{ "query": "Apple stock performance", "filter": "eq(ticker, \"AAPL\") and eq(year, 2023)" }
raised following error:
Unexpected token Token('COMMA', ',') at line 1, column 10.
Expected one of:
* LPAR
Previous tokens: [Token('CNAME', 'ticker')]`
Is there an update on this issue?
This issue is fixed with some simple setting with chat gpt 3.5 turbo model
3 in the below code one important trick even if i am using PGVector data base but translation is working with chromaTranslation or pinecode but not with PGvector translator in your case please change the line and test.
#use below code this is working
from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import PGVector from langchain.retrievers.self_query.pgvector import PGVectorTranslator from langchain.retrievers.self_query.chroma import ChromaTranslator from langchain.retrievers.self_query.pinecone import PineconeTranslator import os
os.environ["OPENAI_API_KEY"]='open ai key' connection = "db connecton string"
collection_name = "my_docs" embeddings = OpenAIEmbeddings()
docs = [ Document( page_content="A bunch of scientists bring back c and mayhem breaks loose", metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"}, ), Document( page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2}, ), Document( page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6}, ), Document( page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3}, ), Document( page_content="Toys come alive and have a blast doing so", metadata={"year": 1995, "genre": "animated"}, ), Document( page_content="Three men walk into the Zone, three men walk out of the Zone", metadata={ "year": 1979, "director": "Andrei Tarkovsky", "genre": "thriller", "rating": 9.9, }, ), ] embeddings = OpenAIEmbeddings() vectorstore = PGVector.from_documents( embedding=embeddings, documents=docs, collection_name=collection_name, connection_string=connection, use_jsonb=True,
)
from langchain.chains.query_constructor.base import AttributeInfo from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain_openai import ChatOpenAI,OpenAI
metadata_field_info = [ AttributeInfo( name="genre", description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']", type="string", ), AttributeInfo( name="year", description="This is related year of the movie", type="integer", ), AttributeInfo( name="director", description="The name of the movie director", type="string", ), AttributeInfo( name="rating", description="A 1-10 rating for the movie", type="float" ), ] document_content_description = "Brief summary of a movie" llm = ChatOpenAI(temperature=0) retriever = SelfQueryRetriever.from_llm( llm, vectorstore, document_content_description, metadata_field_info, structured_query_translator=ChromaTranslator(),
#structured_query_translator=PineconeTranslator()
)
result=retriever.invoke("give me all movie related to dream having rating 8.6 or rating>9 ") print(result) print('#####################')
anyone was able to solve this issue? i have the same myself: Unexpected token Token('COMMA', ',') at line 1, column 8. when using mistral-small ill be really glad if someone could help me solve this
I was also facing the same issue, and though i have not exactly found the problem but the error depends upon the model you are using using a bigger model or we can say better model solves it, i was using 'Gemma-7b-it' which was giving error but using 'Llama-70b-8192' solved it
In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.
You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.
In my case I had this problem because the filter I was getting looked like eq(chapter, '001') but the chapter needed to be in quotes. Which is also common problem you guys mention here. I solved it by adding " Note that 'chapter' is also a string and it should be in quotes in the filter." to description in AttributeInfo.
You can also solve it by writing your own much simpler chain instead of SelfQueryRetriever to just work for your case and give exact examples of jsons that you need generated.
This was really helpful! In my case, I was using the Cohere API and ran into a similar issue where the filter I was getting looked like Released_Year, 2016
, but the Released_Year
needed to be in quotes. Adding the note about the string field needing to be in quotes in the AttributeInfo
description fixed the problem for me.
metadata_field_info = [
AttributeInfo(
name = "Released_Year",
description = "The year the movie was released. You must put Released_Year in quotes like \"Released_Year\"",
type = "integer",
),
AttributeInfo(
name = "IMDB_Rating",
description = "A 1-10 rating for the movie. You must put IMDB_Rating in quotes like \"IMDB_Rating\"",
type = "float"
),
]
System Info
Langchain Version: 0.0.245 model: vicuna-13b-v1.5
Who can help?
No response
Information
Related Components
Reproduction
this is one of the documents
Expected behavior
Error caused while using SelfQueryRetriever with RetrieverQA.
showing only with some queries
I found that in langchain/chains/llm.py, line 282, in predict_and_parse
when result is, the error occurs(Q: what is a phoneme)
'```json\n{\n "query": "phoneme",\n "filter": "eq(lesson, 1)"\n}\n```'
it doesn't occur when result is , (Q: What is a phoneme)
'```json\n{\n "query": "phoneme",\n "filter": "eq(\\"lesson\\", 1)"\n}\n```'
I can't change the version right now.