h2non / jsonpath-ng

Finally, a JSONPath implementation for Python that aims to be standard compliant. That's all. Enjoy!
Apache License 2.0
564 stars 85 forks source link

BUG: Unable to parse the JSON schema for LlamaIndex when there is a space in dictionary key #150

Open prabhupant opened 7 months ago

prabhupant commented 7 months ago

The below code throws error when this schema is passed in LlamaIndex's JSONQueryEngine. The error occurs because the key Issue id has a space in between. Tried this after removing the space with underscore and it worked

json_schema = {
    "description": "Schema defining the jira tickets and their related data",
    "type": "object",
    "properties": {
        "issues": {
            "description": "List of Jira tickets with their related data",
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "Summary": {
                        "description": "Summary of an issue",
                        "type": "string"
                    },
                    "Issue id": {
                        "description": "Issue id of an issue",
                        "type": "integer"
                    }
                }, 
            }
        }
    },
}
michaelmior commented 7 months ago

@prabhupant It would help if you could provide a code sample not using LlamaIndex which exhibits this problem.

prabhupant commented 7 months ago

@michaelmior I don't have a code without LlamaIndex right now, I came across jsonpath-ng through LlamaIndex only. If it helps I am pasting the error I got from LlamaIndex when this library was called

Doing this string -  
$.tech_issue[?(@.Status == 'Open')].* | $.tech_issue[?(@.Status == 'Closed')].* | $.tech_issue[?(@.Status == 'In Progress')].* | $.tech_issue[?(@.Status == 'Resolved')].* | count(@)
---------------------------------------------------------------------------
JsonPathParserError                       Traceback (most recent call last)
Cell In[43], line 1
----> 1 nl_response = nl_query_engine.query(
      2 "Group the issues according to the status field and give count of issues in each group. Also print the entire json data that you processed."
      3 )
      5 nl_response

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
     21 if isinstance(str_or_query_bundle, str):
     22     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
     24 return response

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py:78, in llm_token_counter.<locals>.wrap.<locals>.wrapped_llm_predict(_self, *args, **kwargs)
     76 def wrapped_llm_predict(_self: Any, *args: Any, **kwargs: Any) -> Any:
     77     with wrapper_logic(_self):
---> 78         f_return_val = f(_self, *args, **kwargs)
     80     return f_return_val

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/json_query.py:120, in JSONQueryEngine._query(self, query_bundle)
    115     print_text(f"> JSONPath Prompt: {formatted_prompt}\n")
    116     print_text(
    117         f"> JSONPath Instructions:\n" f"```\n{json_path_response_str}\n```\n"
    118     )
--> 120 json_path_output = self._output_processor(
    121     json_path_response_str,
    122     self._json_value,
    123     **self._output_kwargs,
    124 )
    126 if self._verbose:
    127     print_text(f"> JSONPath Output: {json_path_output}\n")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/json_query.py:47, in default_output_processor(llm_output, json_value)
     44 except ImportError as exc:
     45     raise ImportError(IMPORT_ERROR_MSG) from exc
---> 47 datum: List[DatumInContext] = parse(llm_output).find(json_value)
     48 return [d.value for d in datum]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/ext/parser.py:172, in parse(path, debug)
    171 def parse(path, debug=False):
--> 172     return ExtentedJsonPathParser(debug=debug).parse(path)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:46, in JsonPathParser.parse(self, string, lexer)
     44 lexer = lexer or self.lexer_class()
     45 print("Doing this string - ", string)
---> 46 return self.parse_token_stream(lexer.tokenize(string))

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:70, in JsonPathParser.parse_token_stream(self, token_iterator, start_symbol)
     60 # And we regenerate the parse table every time;
     61 # it doesn't actually take that long!
     62 new_parser = ply.yacc.yacc(module=self,
     63                            debug=self.debug,
     64                            tabmodule = parsing_table_module,
   (...)
     67                            start = start_symbol,
     68                            errorlog = logger)
---> 70 return new_parser.parse(lexer = IteratorToTokenStream(token_iterator))

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:333, in LRParser.parse(self, input, lexer, debug, tracking, tokenfunc)
    331     return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332 else:
--> 333     return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:1201, in LRParser.parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1199     errtoken.lexer = lexer
   1200 self.state = state
-> 1201 tok = call_errorfunc(self.errorfunc, errtoken, self)
   1202 if self.errorok:
   1203     # User must have done some kind of panic
   1204     # mode recovery on their own.  The
   1205     # returned token is the next lookahead
   1206     lookahead = tok

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ply/yacc.py:192, in call_errorfunc(errorfunc, token, parser)
    190 _token = parser.token
    191 _restart = parser.restart
--> 192 r = errorfunc(token)
    193 try:
    194     del _errok, _token, _restart

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/jsonpath_ng/parser.py:84, in JsonPathParser.p_error(self, t)
     83 def p_error(self, t):
---> 84     raise JsonPathParserError('Parse error at %s:%s near token %s (%s)'
     85                               % (t.lineno, t.col, t.value, t.type))

JsonPathParserError: Parse error at 2:179 near token ( (()