kennknowles / python-jsonpath-rw

A robust and significantly extended implementation of JSONPath for Python, with a clear AST for metaprogramming.
Apache License 2.0
603 stars 194 forks source link

Union syntax doesn't work for complex case #49

Closed liketic closed 7 years ago

liketic commented 7 years ago

Hi,

I want to extract something from a JSON format source with multiple jsonpath expressions at the same time. For example when I have a JSON like this:

{
"a": "123",
"b": "456"
}

I want to get all elements in a and b: ["123", "456"].

I found the following syntax in the reference doc: https://pypi.python.org/pypi/jsonpath-rw

jsonpath1 | jsonpath2 Any nodes matching the union of jsonpath1 and jsonpath2

But unfortunately, it doesn't work for me:

from jsonpath_rw import parse

if __name__ == '__main__':
    x = {
        "a": [
            "1",
            "2"
        ],
        "b": {
            "xxx": ["57"]
        },
    }

    expression = parse('a[-1]  | b.xxx')
    print [match.value for match in expression.find(x)]

The result is ['57'] but I expect it should be ['2', '57']. Did I missed anything?

Thanks.

danbaehr commented 7 years ago

It seems to be something related to the handling of indices in union cases. I hit something similar recently and have been (unsuccessfully) trying to work through it.

When using the following JSON:

r = {
        "targets": [
            {
                "a": 1,
                "b": 2,
                "c": [
                    3,
                    4
                ],
            },
            {
                "a": 5,
                "b": 6,
                "c": [
                    7,
                    8
                ],
            },
        ],
}

jsonpath targets[*].a|c gives: [1, [3, 4], 5, [7, 8]]

and jsonpath targets[*].c[0] gives: [3, 7]

however, jsonpath targets[*].a|c[0] throws an error:

Traceback (most recent call last):
  File "myfile.py", line 56, in <module>
    matched_value = [match.value for match in parsedjpath.find(r)]
  File "/Library/Python/2.7/site-packages/jsonpath_rw/jsonpath.py", line 227, in find
    for submatch in self.right.find(subdata)]
  File "/Library/Python/2.7/site-packages/jsonpath_rw/jsonpath.py", line 443, in find
    if len(datum.value) > self.index:
TypeError: object of type 'int' has no len()

and jsonpath targets[*].a|'c[0]' gives: [1, 5]

instead of what I would expect: [1, 3, 5, 7]

I started walking through the code but quickly got lost in the lexer and parser modules. My debug output suggested that the problem might be in the ply.yacc token handling though, rather than jsonpath-rw itself.

liketic commented 7 years ago

@danbaehr You're right. Seems like the union syntax is not supported well.

dhivyasa commented 7 years ago

Using parenthesis in the jpath syntax resolves this issue:

 def test_nested_index(self):
        r = """{
                "targets": [{"a": "1","b": "2","c": ["3","4"]},
                            {"a": "5","b": "6","c": ["7","8"]}]
                }
            """
        jpath = parser.parse('targets[*].a|(c[0])')
        self.assertEqual([match.value for match in jpath.find(json.loads(r))
                          ], ['1','3','5','7'])
liketic commented 7 years ago

@dhivyasa Thanks for your reply. But for my example, it's still not work.

aplamada commented 7 years ago

@likel

expression = parse('(a[-1])  | (b.xxx)')
print([match.value for match in expression.find(x)])

['2', ['57']]