kennknowles / python-jsonpath-rw

A robust and significantly extended implementation of JSONPath for Python, with a clear AST for metaprogramming.
Apache License 2.0
603 stars 195 forks source link

Feature request: regex-style capture groups. #32

Open brianthelion opened 9 years ago

brianthelion commented 9 years ago

If my (very limited) understanding of jsonpath and jsonpath-rw is correct -- BIG "if" there -- a .find() returns the entire path through the object, root to leaf, that satisfies the given expression. So with an "un-rooted" expression likefield1.field2 on a complicated object, the thing you get back is yet another complicated object. You know that field1.field2 is in there somewhere, but you have no idea where. Retrieving, say, "everything above field1.field2" means doing guess-and-check on the returned path. This is repeated work.

regex engines solve a similar problem by capturing parts of matched string during their search. "Capture groups" (also sometimes called "match groups") refer to the parts of the string that should be kept if matched. There are various mechanisms for then referencing the match groups on success, including by name and by number.

Something similar to regex capture groups would be highly useful in jsonpath.

(Apologies in advance if I'm a moron and a capture capability is already present.)

brianthelion commented 9 years ago

Real life use-case that I'm dealing with: $.paths.*.get,put,post,options,delete,head,patch where I would like to capture the * and the get,put,post,options,delete,head,patch matches.

Possible capture syntax for this use-case using {} and #:

$.paths.{*}#foo.{get,put,post,options,delete,head,patch}#bar

where capture groups populate match.captured['foo'] and match.captured['bar'] after .find(...).

brianthelion commented 9 years ago

Started trying to implement this myself in earnest. So far I have the lexer accepting '{' and '}' correctly. Then in jsonpath.py I defined:

class Capture(JSONPath):
    def __init__(self, subexpr, callback):
        self._subexpr = subexpr
        self._callback = callback

    def find(self, datum):
        results = self._subexpr.find(datum)
        self._callback(results)
        return results

    def __str__(self):
        return '{%s}' % self._subexpr

    def __repr__(self):
        return '%s(subexpr=%r)' % (self.__class__.__name__, self._subexpr)

The callback gets set on the parser as follows:

class JsonPathParser(object):
    ...
    def p_jsonpath_braces(self, p):
        "jsonpath : '{' jsonpath '}'"
        p[0] = Capture(p[2], self._register_capture)

I think this all makes sense so far but what exactly Capture.find is supposed to do is a little fuzzy in my mind. Am I on the right track?