h2non / jsonpath-ng

Finally, a JSONPath implementation for Python that aims to be standard compliant. That's all. Enjoy!
Apache License 2.0
564 stars 85 forks source link

Writing custom extensions, `length` function #171

Open joepatol opened 1 month ago

joepatol commented 1 month ago

Hi,

I have a use-case that doesn't seem to be possible with the current implementation.

Suppose I have some data looking like this:

data = {
    "Items": [
        {"val": "foo", "filter": "a"},
        {"val": "bar", "filter": "a"},
        {"val": "baz", "filter": "b"},
    ]
}

Now, I want to get the number of Items where filter == "a" (=2). I used the len function but that returns the length of the current objects not the array they are part of.

  query = '$.Items[?filter == "a"].`len`'
  jsonpath_expr: JSONPath = parse(query)
  result = jsonpath_expr.find(data)

  print(result)

# [DatumInContext(value=2, path=Len(), context=None), DatumInContext(value=2, path=Len(), context=None)]

After some research, this seems to be the expected behavior of JSONPath. I did see some other implementations which provide a length() function, which would be able to do what I want. That seems not to be supported by this library.

Is there a way to achieve what I want with the current implementation in this library?

The readme states: "More generally, this syntax allows "named operators" to extend JSONPath is arbitrary ways", which makes me think I should be able to extend the implementation with my own function. That'd also work for me, however I can't seem to find any documentation on how to write extensions for the library.

What is the recommended way, if any, to write custom extensions (like a length function) for this library.

Thanks for helping!

jg-rp commented 1 month ago

Even with a custom named operator (like .`len`), I think you'll struggle to address filter (like [?filter == "a"]) results as a single sequence to be able to count them. Internally, the find() method of each selector is called once for each datum (analogous to "Value" or "Node" from RFC 9535), without the option to reference a list of intermediate results (like "Nodelist" in RFC 9535).

The following - somewhat hacky - example works around this by defining a "flat filter" operator (?* instead of ?), then uses len as normal.

from jsonpath_ng.ext.parser import ExtentedJsonPathParser as ExtendedJsonPathParser
from jsonpath_ng.ext.parser import ExtendedJsonPathLexer
from jsonpath_ng.ext.filter import Filter
from jsonpath_ng.jsonpath import DatumInContext

class MyJSONPathLexer(ExtendedJsonPathLexer):
    """An extended lexer with a "flat" filter operator."""

    tokens = ["FLAT_FILTER"] + ExtendedJsonPathLexer.tokens
    t_FLAT_FILTER = r"\?\*?"

class MyJSONPathParser(ExtendedJsonPathParser):
    """An extended parser with a "flat" filter operator."""

    tokens = MyJSONPathLexer.tokens

    def __init__(self, debug=False):
        super().__init__(debug, MyJSONPathLexer)

    def p_filter(self, p):
        "filter : FLAT_FILTER expressions"
        if p[1] == "?":
            p[0] = Filter(p[2])
        else:
            p[0] = FlatFilter(p[2])

class FlatFilter(Filter):
    def find(self, data):
        return [DatumInContext([d.value for d in super().find(data)])]

def parse(path, debug=False):
    return MyJSONPathParser(debug=debug).parse(path)

if __name__ == "__main__":
    data = {
        "Items": [
            {"val": "foo", "filter": "a"},
            {"val": "bar", "filter": "a"},
            {"val": "baz", "filter": "b"},
        ]
    }

    query = '$.Items[?*filter == "a"].`len`'
    jsonpath_expr = parse(query)
    result = jsonpath_expr.find(data)
    print([r.value for r in result])  # [2]

Note that RFC 9535 does not handle this use case either. You'd need to use Python's len() function on the query results to retrieve the number of values matched by the filter.