run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.64k stars 5.04k forks source link

[Bug]: JSONQueryEngine Exception - Parse Error at $s near token #7747

Closed thanh-cnguyen closed 7 months ago

thanh-cnguyen commented 1 year ago

Bug Description

The parameter llm_output of function default_output_processor can be a string of:

Exception: jsonpath_ng: Exception: Parse error at 1:5 near token task (ID)

Version

0.8.14

Steps to Reproduce

  1. Setup JSONQueryEngine as usual with json_value, json_schema, and service context
  2. Perform querying using .query()

The bug doesn't occur often, so that it might take some time. Pay attention to the llm_output of default_output_processor

Note: I have to use a custom output processor that mimics the default function to see the issue.

Relevant Logs/Tracbacks

  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/tools/query_engine.py", line 54, in call
    response = self._query_engine.query(query_str)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/query_engine/transform_query_engine.py", line 79, in _query
    return self._query_engine.query(query_bundle)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/indices/struct_store/json_query.py", line 116, in _query
    json_path_output = self._output_processor(
  File "/virtualenv/texasfile/lib/python3.8/site-packages/llama_index/indices/struct_store/json_query.py", line 48, in default_output_processor
    datum: List[DatumInContext] = parse(llm_output).find(json_value)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/jsonpath_ng/ext/parser.py", line 172, in parse
    return ExtentedJsonPathParser(debug=debug).parse(path)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/jsonpath_ng/parser.py", line 45, in parse
    return self.parse_token_stream(lexer.tokenize(string))
  File "/virtualenv/texasfile/lib/python3.8/site-packages/jsonpath_ng/parser.py", line 69, in parse_token_stream
    return new_parser.parse(lexer = IteratorToTokenStream(token_iterator))
  File "/virtualenv/texasfile/lib/python3.8/site-packages/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/ply/yacc.py", line 1201, in parseopt_notrack
    tok = call_errorfunc(self.errorfunc, errtoken, self)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/ply/yacc.py", line 192, in call_errorfunc
    r = errorfunc(token)
  File "/virtualenv/texasfile/lib/python3.8/site-packages/jsonpath_ng/parser.py", line 83, in p_error
    raise JsonPathParserError('Parse error at %s:%s near token %s (%s)'
jsonpath_ng.exceptions.JsonPathParserError: Parse error at 1:5 near token task (ID)
dosubot[bot] commented 1 year ago

🤖

Hello,

Thank you for bringing this issue to our attention. The error you're encountering seems to be due to the llm_output parameter of the default_output_processor function being a string of query instructions or the final response from the engine, rather than the expected JSON Path.

To resolve this issue, you could modify the default_output_processor function to check if the llm_output is a valid JSON Path before attempting to parse it. If it's not, the function could return an error message or handle the situation in a way that suits your application.

Here's a modified version of the default_output_processor function that includes this check:

def default_output_processor(llm_output: str, json_value: JSONType) -> JSONType:
    """Default output processor that executes the JSON Path query."""
    try:
        from jsonpath_ng.ext import parse
        from jsonpath_ng.jsonpath import DatumInContext
    except ImportError as exc:
        raise ImportError(IMPORT_ERROR_MSG) from exc

    # Check if llm_output is a valid JSON Path
    if not isinstance(llm_output, str) or not llm_output.startswith('$'):
        raise ValueError("Invalid JSON Path: llm_output should be a string starting with '$'")

    datum: List[DatumInContext] = parse(llm_output).find(json_value)
    return [d.value for d in datum]

This version of the function raises a ValueError if llm_output is not a string starting with '$', which is a common starting point for JSON Path expressions. You can adjust this check to suit the specific format of JSON Path expressions used in your application.

Please note that this is a workaround and might not cover all edge cases. If the issue persists, please provide more details about the specific queries that are causing the problem.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

wodka commented 11 months ago

I am experiencing the same error sometimes. Digging into the llm_output it might be a hint that the stated problems seems to be a bad match for the json query engine.

llm_output=The task provided seems to be more related to data processing or machine learning, which involves creating new categories (codes) based on the provided cells. This is not something that can be achieved using JSONPath.

JSONPath is a query language for JSON, similar to XPath for XML, and it's used to retrieve data from a JSON document, not to process or generate new data. ...

dosubot[bot] commented 7 months ago

Hi, @thanh-cnguyen,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. The issue you opened relates to a bug in the JSONQueryEngine's default_output_processor function, causing a parse error when the llm_output parameter contains unexpected data. There have been discussions about potential workarounds, such as modifying the function to check the validity of the llm_output before parsing it. Another user, @wodka, has also shared insights into a similar error.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. If you have any further questions or updates, feel free to reach out.