rnag / dataclass-wizard

A simple, yet elegant, set of wizarding tools for interacting with Python dataclasses.
Other
174 stars 22 forks source link

Question/feature request: Map a (nested) JSON/dict Key to a Field #60

Open iwconfig opened 2 years ago

iwconfig commented 2 years ago

It is possible to map JSON keys to fields with different names. However, I would like to map nested JSON/dict by key path, preferably by dot-delimited strings

'parent.child.grandchild[1].sibling'

or as lists of keys

['parent', 'child', 'grandchild', 1, 'sibling']

So it would look something like

class Meta(JSONSerializable.Meta):
        json_key_to_field = {
            'parent.child.grandchild[1].sibling': 'my_str'
        }

or

json_field(
        ('myJSONKey',
         'parent.child.grandchild[1].sibling',
         ('parent2', 'child', 'grandchild', 4, 'sibling'),
         'myField'
)

As of yet, I have not been able to come up with a way to accomplish this with dataclass-wizard. Do you know if this is currently possible? If not, is it something you would consider implementing?

Thank you for a terrific module btw!

rnag commented 2 years ago

Hi @iwconfig, thanks for opening this issue! I agree that a feature request for a nested JSON path traversal is certainly an interesting one, and one that is worth supporting. I've actually added a milestone a while back to help track this feature, but it is definitely one that I have plans to implement in one of the upcoming releases.

I will keep this thread updated as I make more progress towards the request. In any case, I've also added a 'help wanted' label to this issue in case anyone wants to take a stab at implementing the feature as well.

rnag commented 2 years ago

Here is some initial work I've been able to put together so far. This was inspired in part by this post which was posted on SO. It could use some slight modifications, but I am glad that it's at least working so far.

from functools import reduce

# pip install fnc pydash
import fnc
import pydash

class JsonPath:

    @classmethod
    def get(cls, data, path):
        for p in path:
            data = data[p]

        return data

    @classmethod
    def get_v2(cls, data, path):
        """For some reason, an approach with `functools.reduce` is slower than a `for` loop"""
        return reduce(cls._get_item, path, data)

    @classmethod
    def _get_item(cls, current_data, current_path):
        return current_data[current_path]

    @classmethod
    def get_safe(cls, data, path, default=None):
        """Same as `get` but handles cases where key is missing, or index is out of bounds."""
        current_data = data
        p = path  # to avoid "unbound local variable" warnings

        try:
            for p in path:
                current_data = current_data[p]

            return current_data

        # IndexError -
        #   raised when `data` is a `list`, and we access an index that is "out of bounds"
        # KeyError -
        #   raised when `data` is a `dict`, and we access a key that is not present
        # AttributeError -
        #   raised when `data` is an invalid type, such as a `None`
        except (IndexError, KeyError, AttributeError):
            return default

        # TypeError -
        #   raised when `data` is a `list`, but we try to use it like a `dict`
        except TypeError as e:
            raise TypeError(f'Invalid path\n  '
                            f'data={data}\n  '
                            f'path={path}\n  '
                            f'current_data={current_data}\n  '
                            f'current_path={p!r}\n  '
                            f'error={e}') from None

if __name__ == '__main__':
    from timeit import timeit

    d = {"a": {"b": [1, {"c": ["d"]}, 2, ["hello world"]]}}

    assert JsonPath.get_safe(d, ['z']) is None
    assert JsonPath.get(d, ['a', 'b', 3, -1]) == 'hello world'

    data_path = ['a', 'b', 1, 'c', 0]
    data_path_invalid = ['a', 'b', 1, 'c', 321]
    get_fn = lambda x: x['a']['b'][1]['c'][0]

    assert JsonPath.get(d, data_path) == fnc.get(data_path, d) == pydash.get(d, data_path) == 'd'
    assert JsonPath.get_safe(d, data_path_invalid, 112233) == 112233
    assert get_fn(d) == 'd'

    n = 100_000

    print(f'get (no loop):       {timeit("get_fn(d)", globals=globals(), number=n):.3f}')
    print(f'get_fnc:             {timeit("fnc.get(data_path, d)", globals=globals(), number=n):.3f}')
    print(f'get_pydash:          {timeit("pydash.get(d, data_path)", globals=globals(), number=n):.3f}')
    print(f'get:                 {timeit("JsonPath.get(d, data_path)", globals=globals(), number=n):.3f}')
    print(f'get (reduce):        {timeit("JsonPath.get_v2(d, data_path)", globals=globals(), number=n):.3f}')
    print(f'get_safe:            {timeit("JsonPath.get_safe(d, data_path, 112233)", globals=globals(), number=n):.3f}')
    print(f'get_safe (invalid):  {timeit("JsonPath.get_safe(d, data_path_invalid, 112233)", globals=globals(), number=n):.3f}')

Results on Mac running Python 3.12:

get (no loop):       0.007
get_fnc:             0.134
get_pydash:          0.442
get:                 0.017
get (reduce):        0.044
get_safe:            0.019
get_safe (invalid):  0.035
iwconfig commented 2 years ago

Oh, I didn't see that milestone, sorry. That's great!

Wow, awesome! What you've got there is just for paths separated into a list, right? I know fnc and pydash (same author) also offers this functionality, and can handle dot-delimited paths as well. Maybe you'll find something interesting in how they do it. pydash is more mature but fnc is faster due to generator based approach.

Thank you for this! :1st_place_medal:

rnag commented 1 week ago

@iwconfig sorry for the delay in moving this forward - got busy with life.

Thanks for the suggestions! I updated my benchmark code to test with fnc and pydash as mentioned. Also you are correct, those are just for paths already separated into a list. For parsing a dot-delimited string into a list, I might need another library for that (or just create one 🤔 ).

Going to look into this shortly when time allows, as I will have some bandwidth for it due to the holiday season coming up.