langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.81k stars 1.98k forks source link

Error parsing SyntaxError using the SelfQueryRetriever #1480

Open ladrians opened 1 year ago

ladrians commented 1 year ago

I am testing the SelfQuery Retriever option with some code as follows and gpt 3.5 turbo.

      const attributeInfo: AttributeInfo[] = getAttributeInfo(options?.retriever?.metadata || '');        
      retriever = SelfQueryRetriever.fromLLM({
        llm: model,
        documentContents: SELF_QUERY_DOC_CONTENTS,
        vectorStore: vectorStore,
        attributeInfo,
        structuredQueryTranslator: new BasicTranslator(),
      });
      retriever.verbose = verbose;
      retriever.searchParams = {
        k: k,
        filter: filter,
      };

currently have year and quarter filters defined, if I set the following question what is the Q2 2021 revenue? I get the following error:

Error: Error parsing SyntaxError: [1:20]: Unexpected token: 'identifier': eq("quarter", 2) and eq("year", "2021")
 at ExpressionParser.parse (C:\test\node_modules\langchain\dist\output_parsers\expression.js:41:19)
 at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
 at QueryTransformer.parse (C:\test\node_modules\langchain\dist\chains\query_constructor\parser.js:93:25)
 at StructuredQueryOutputParser.parserFunction (C:\test\node_modules\langchain\dist\chains\query_constructor\index.js:44:38) 
 at LLMChain._getFinalOutput (C:\test\node_modules\langchain\dist\chains\llm_chain.js:66:31)
 at LLMChain._call (C:\test\node_modules\langchain\dist\chains\llm_chain.js:94:31)
 at LLMChain.call (C:\test\node_modules\langchain\dist\chains\base.js:65:28)
 at SelfQueryRetriever.getRelevantDocuments (C:\test\node_modules\langchain\dist\retrievers\self_query\index.js:45:55)       
 at ConversationalRetrievalQAChain._call (C:\test\node_modules\langchain\dist\chains\conversational_retrieval_chain.js:105:22)
 at ConversationalRetrievalQAChain.call (C:\test\node_modules\langchain\dist\chains\base.js:65:28)

Using the verbose option noticed the large prompt used for generating and parsing the initial query, but couldn't figure out a way to overide it using the current methods available? Is there a way to try/error the prompt to see if it improves?

dosubot[bot] commented 10 months ago

Hi, @ladrians! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you reported an issue regarding an error that occurs when using the SelfQuery Retriever option with specific filters. The error seems to be related to parsing syntax when trying to parse a question. You also mentioned that you were wondering if trying different prompts could potentially resolve the issue.

Since there hasn't been any recent activity or comments on this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

ladrians commented 10 months ago

OK, please do not close the ticket yet, I will be checking the case on September, is it OK?

dosubot[bot] commented 10 months ago

@jacoblee93 Could you please help @ladrians with this issue regarding the error that occurs when using the SelfQuery Retriever option with specific filters? They mentioned that they will be checking the case in September. Thank you!

ladrians commented 8 months ago

I could retest this case using version 0.0.160, it still returns an error. This is the return from selfquery for the filters

json\n{\n \"query\": \"revenue\",\n \"filter\": \"eq(\\\"quarter\\\", 2) and eq(\\\"year\\\", 2021)\"\n}\n

with similar exception processing 'eq("quarter", 2) and eq("year", 2021)'

Error: Error parsing SyntaxError: Expected " ", "(", ".", "[", "\t", "\x0B", "\x0C", " ", "", or end of input but "a" found.: eq("quarter", 2) 
and eq("year", 2021)
    at ExpressionParser.parse (C:\test\cdk\node_modules\langchain\dist\output_parsers\expression.cjs:89:19)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async QueryTransformer.parse (C:\test\cdk\node_modules\langchain\dist\chains\query_constructor\parser.cjs:117:25)      
    at async StructuredQueryOutputParser.outputProcessor (C:\test\cdk\node_modules\langchain\dist\chains\query_constructor\index.cjs:64:34)
    at async LLMChain._getFinalOutput (C:\test\cdk\node_modules\langchain\dist\chains\llm_chain.cjs:96:31)
    at async LLMChain._call (C:\test\cdk\node_modules\langchain\dist\chains\llm_chain.cjs:126:31)
    at async LLMChain.call (C:\test\cdk\node_modules\langchain\dist\chains\base.cjs:104:28)
    at async SelfQueryRetriever._getRelevantDocuments (C:\test\cdk\node_modules\langchain\dist\retrievers\self_query\index.cjs:69:55)
    at async SelfQueryRetriever.getRelevantDocuments (C:\test\cdk\node_modules\langchain\dist\schema\retriever.cjs:69:29)     
    at async ConversationalRetrievalQAChain._call (C:\test\cdk\node_modules\langchain\dist\chains\conversational_retrieval_chain.cjs:140:22)

thanks

ladrians commented 6 months ago

I got back to a project trying to use this feature. Now using langchain: 0.0.214, can confirm that the problem continues.

json\n{\n    \"query\": \"revenue\",\n    \"filter\": \"eq(\\\"quarter\\\", 2) and eq(\\\"year\\\", 2021)\"\n}\n
...
SyntaxError: Expected " ", "(", ".", "[", "\t", "\x0B", "\x0C", " ", "", or end of input but "a" found.: eq("quarter", 2) and eq("year", 2021)

Is there planned an option to enable changing the default Prompt to modify it and try to get it solved? Havent checked the LCEL yet, is there a walkthrough on how to convert this retriever to a pipelined one to try with manual prompts? In my case I am using a ConversationalQA chain with a selfQueryRetriever.

Any other ideas are welcome, thanks

jacoblee93 commented 6 months ago

Apologies for not getting back to you sooner - yes a new chain is definitely on the roadmap. Will aim to get that done ASAP if nobody beats me to it

dosubot[bot] commented 2 months ago

Hi, @ladrians

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. The issue involves an error when using the SelfQuery Retriever option with specific filters, resulting in SyntaxError and unexpected tokens. You mentioned trying different prompts to potentially resolve the issue and asked about the possibility of changing the default prompt. Jacoblee93 acknowledged the issue and mentioned that a new chain is on the roadmap to address the problem.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

ladrians commented 2 months ago

Yes I can confirm the issue is relevant. The need to change the prompt for self query, change the default few shots are important to tailor it to specific needs.

jacoblee93 commented 2 months ago

Still on the TODO list!

ladrians commented 2 months ago

I updated to 0.1.36 and at least I can modify the first part of the prompt, not perfect but it goes in the needed direction

selfQueryVectorstoreInstance.lc_kwargs.queryConstructor.first.prefix = "desired value here";