langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.37k stars 14.54k forks source link

Self-querying with Chroma bug - Got invalid return object. Expected markdown code snippet with JSON object, but got ... #5552

Closed Oliver-Douz closed 8 months ago

Oliver-Douz commented 1 year ago

System Info

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /opt/conda/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py:36 in parse │ │ │ │ 33 │ def parse(self, text: str) -> StructuredQuery: │ │ 34 │ │ try: │ │ 35 │ │ │ expected_keys = ["query", "filter"] │ │ ❱ 36 │ │ │ parsed = parse_json_markdown(text, expected_keys) │ │ 37 │ │ │ if len(parsed["query"]) == 0: │ │ 38 │ │ │ │ parsed["query"] = " " │ │ 39 │ │ │ if parsed["filter"] == "NO_FILTER" or not parsed["filter"]: │ │ │ │ /opt/conda/lib/python3.10/site-packages/langchain/output_parsers/structured.py:27 in │ │ parse_json_markdown │ │ │ │ 24 │ │ 25 def parse_json_markdown(text: str, expected_keys: List[str]) -> Any: │ │ 26 │ if "```json" not in text: │ │ ❱ 27 │ │ raise OutputParserException( │ │ 28 │ │ │ f"Got invalid return object. Expected markdown code snippet with JSON " │ │ 29 │ │ │ f"object, but got:\n{text}" │ │ 30 │ │ ) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutputParserException: Got invalid return object. Expected markdown code snippet with JSON object, but got:

{
    "query": "chatbot refinement",
    "filter": "NO_FILTER"
}

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /tmp/ipykernel_28206/2038672913.py:1 in │ │ │ │ [Errno 2] No such file or directory: '/tmp/ipykernel_28206/2038672913.py' │ │ │ │ /opt/conda/lib/python3.10/site-packages/langchain/retrievers/self_query/base.py:73 in │ │ get_relevant_documents │ │ │ │ 70 │ │ """ │ │ 71 │ │ inputs = self.llm_chain.prep_inputs({"query": query}) │ │ 72 │ │ structured_query = cast( │ │ ❱ 73 │ │ │ StructuredQuery, self.llm_chain.predict_and_parse(callbacks=None, inputs) │ │ 74 │ │ ) │ │ 75 │ │ if self.verbose: │ │ 76 │ │ │ print(structured_query) │ │ │ │ /opt/conda/lib/python3.10/site-packages/langchain/chains/llm.py:238 in predict_and_parse │ │ │ │ 235 │ │ """Call predict and then parse the results.""" │ │ 236 │ │ result = self.predict(callbacks=callbacks, kwargs) │ │ 237 │ │ if self.prompt.output_parser is not None: │ │ ❱ 238 │ │ │ return self.prompt.output_parser.parse(result) │ │ 239 │ │ else: │ │ 240 │ │ │ return result │ │ 241 │ │ │ │ /opt/conda/lib/python3.10/site-packages/langchain/chains/query_constructor/base.py:49 in parse │ │ │ │ 46 │ │ │ │ limit=parsed.get("limit"), │ │ 47 │ │ │ ) │ │ 48 │ │ except Exception as e: │ │ ❱ 49 │ │ │ raise OutputParserException( │ │ 50 │ │ │ │ f"Parsing text\n{text}\n raised following error:\n{e}" │ │ 51 │ │ │ ) │ │ 52 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutputParserException: Parsing text

{
    "query": "chatbot refinement",
    "filter": "NO_FILTER"
}

raised following error: Got invalid return object. Expected markdown code snippet with JSON object, but got:

{
    "query": "chatbot refinement",
    "filter": "NO_FILTER"
}

Who can help?

No response

Information

Related Components

Reproduction

class Dolly(LLM):

    history_data: Optional[List] = []
    chatbot : Optional[hugchat.ChatBot] = None
    conversation : Optional[str] = ""
    #### WARNING : for each api call this library will create a new chat on chat.openai.com

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        if stop is not None:
            pass
            #raise ValueError("stop kwargs are not permitted.")
        #token is a must check
        if self.chatbot is None:
            if self.conversation == "":
                self.chatbot = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
            else:
                raise ValueError("Something went wrong")

        sleep(2)
        data = self.chatbot(prompt)[0]["generated_text"]

        #add to history
        self.history_data.append({"prompt":prompt,"response":data})    

        return data

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"model": "DollyCHAT"}

llm = Dolly()

Then I follow the instructions in https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/chroma_self_query.html

and I got the above error, sometimes it works, but sometimes it doesn't.

Expected behavior

Should not return error and act like before (return the related documents)

haizhiguang commented 1 year ago

meeting the same issue now.....

mmtmr commented 1 year ago

Same issue

shubham184 commented 1 year ago

+1

pabloski0000 commented 1 year ago

The same error is thrown but using OpenAI LLM (Large Language Model) and subquestions functionality. If I just create a basic query engine and make a normal query everything goes right. The issue arises when subquestions functionality is used

97k commented 1 year ago

Facing the Same issue! However, if I explicitly instruct the LLM at last by just appending the error I am getting it worked

<< OUTPUT (remember to include the ```json)>> Format the output as markdown code snippet with JSON object

JonasWells commented 1 year ago

Same issue

lauris-tw commented 1 year ago

Being less nice than @97k worked for me:

<< OUTPUT (must include the ```json markup)>>
ddealwis09 commented 11 months ago

The same error is thrown but using OpenAI LLM (Large Language Model) and subquestions functionality. If I just create a basic query engine and make a normal query everything goes right. The issue arises when subquestions functionality is used

Did you ever get this working?

dosubot[bot] commented 8 months ago

Hi, @Oliver-Douz

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, the issue pertains to a bug in self-querying with Chroma, where the expected markdown code snippet with a JSON object is not being returned. There have been discussions and workarounds shared by the community, but the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!