Open iwconfig opened 2 years ago
Hi @iwconfig, thanks for opening this issue! I agree that a feature request for a nested JSON path traversal is certainly an interesting one, and one that is worth supporting. I've actually added a milestone a while back to help track this feature, but it is definitely one that I have plans to implement in one of the upcoming releases.
I will keep this thread updated as I make more progress towards the request. In any case, I've also added a 'help wanted' label to this issue in case anyone wants to take a stab at implementing the feature as well.
Here is some initial work I've been able to put together so far. This was inspired in part by this post which was posted on SO. It could use some slight modifications, but I am glad that it's at least working so far.
from functools import reduce
# pip install fnc pydash
import fnc
import pydash
class JsonPath:
@classmethod
def get(cls, data, path):
for p in path:
data = data[p]
return data
@classmethod
def get_v2(cls, data, path):
"""For some reason, an approach with `functools.reduce` is slower than a `for` loop"""
return reduce(cls._get_item, path, data)
@classmethod
def _get_item(cls, current_data, current_path):
return current_data[current_path]
@classmethod
def get_safe(cls, data, path, default=None):
"""Same as `get` but handles cases where key is missing, or index is out of bounds."""
current_data = data
p = path # to avoid "unbound local variable" warnings
try:
for p in path:
current_data = current_data[p]
return current_data
# IndexError -
# raised when `data` is a `list`, and we access an index that is "out of bounds"
# KeyError -
# raised when `data` is a `dict`, and we access a key that is not present
# AttributeError -
# raised when `data` is an invalid type, such as a `None`
except (IndexError, KeyError, AttributeError):
return default
# TypeError -
# raised when `data` is a `list`, but we try to use it like a `dict`
except TypeError as e:
raise TypeError(f'Invalid path\n '
f'data={data}\n '
f'path={path}\n '
f'current_data={current_data}\n '
f'current_path={p!r}\n '
f'error={e}') from None
if __name__ == '__main__':
from timeit import timeit
d = {"a": {"b": [1, {"c": ["d"]}, 2, ["hello world"]]}}
assert JsonPath.get_safe(d, ['z']) is None
assert JsonPath.get(d, ['a', 'b', 3, -1]) == 'hello world'
data_path = ['a', 'b', 1, 'c', 0]
data_path_invalid = ['a', 'b', 1, 'c', 321]
get_fn = lambda x: x['a']['b'][1]['c'][0]
assert JsonPath.get(d, data_path) == fnc.get(data_path, d) == pydash.get(d, data_path) == 'd'
assert JsonPath.get_safe(d, data_path_invalid, 112233) == 112233
assert get_fn(d) == 'd'
n = 100_000
print(f'get (no loop): {timeit("get_fn(d)", globals=globals(), number=n):.3f}')
print(f'get_fnc: {timeit("fnc.get(data_path, d)", globals=globals(), number=n):.3f}')
print(f'get_pydash: {timeit("pydash.get(d, data_path)", globals=globals(), number=n):.3f}')
print(f'get: {timeit("JsonPath.get(d, data_path)", globals=globals(), number=n):.3f}')
print(f'get (reduce): {timeit("JsonPath.get_v2(d, data_path)", globals=globals(), number=n):.3f}')
print(f'get_safe: {timeit("JsonPath.get_safe(d, data_path, 112233)", globals=globals(), number=n):.3f}')
print(f'get_safe (invalid): {timeit("JsonPath.get_safe(d, data_path_invalid, 112233)", globals=globals(), number=n):.3f}')
Results on Mac running Python 3.12:
get (no loop): 0.007
get_fnc: 0.134
get_pydash: 0.442
get: 0.017
get (reduce): 0.044
get_safe: 0.019
get_safe (invalid): 0.035
Oh, I didn't see that milestone, sorry. That's great!
Wow, awesome! What you've got there is just for paths separated into a list, right? I know fnc and pydash (same author) also offers this functionality, and can handle dot-delimited paths as well. Maybe you'll find something interesting in how they do it. pydash
is more mature but fnc
is faster due to generator based approach.
Thank you for this! :1st_place_medal:
@iwconfig sorry for the delay in moving this forward - got busy with life.
Thanks for the suggestions! I updated my benchmark code to test with fnc
and pydash
as mentioned. Also you are correct, those are just for paths already separated into a list. For parsing a dot-delimited string into a list, I might need another library for that (or just create one 🤔 ).
Going to look into this shortly when time allows, as I will have some bandwidth for it due to the holiday season coming up.
It is possible to map JSON keys to fields with different names. However, I would like to map nested JSON/dict by key path, preferably by dot-delimited strings
or as lists of keys
So it would look something like
or
As of yet, I have not been able to come up with a way to accomplish this with
dataclass-wizard
. Do you know if this is currently possible? If not, is it something you would consider implementing?Thank you for a terrific module btw!