ianlini / flatten-dict

A flexible utility for flattening and unflattening dict-like objects in Python.
MIT License
176 stars 36 forks source link

unflatten with lists #8

Open benbowen opened 5 years ago

benbowen commented 5 years ago

Flattening a nested dict that contains lists works great, but unflatten makes dicts instead of lists when index is list index. I rewrote part of your lib to unflatten for my needs and thought you might want to integrate it into you unflatten.

I'm worried that my changes aren't generic enough work for all kinds of mixed list with dict.

Here is I how did the unflattening. The only function I change is this one:

def nested_set_dict(d, keys, value):
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        else:
            d[key] = []
        d = d[key]
    elif type(key)==int:
        if (key+1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)

Testing it out:

d1 = {'a':{'b':[{'c1':'nested1!','d1':[{'e1':'so_nested1!!!'}]},
               {'c2':'nested2!','d2':[{'e2':'so_nested2!!!'}]},
               {'c3':'nested3!','d3':[{'e3':'so_nested3!!!'}]},
               {'c4':'nested4!','d4':[{'e4':'so_nested4a!!!'},
                                      {'e4':'so_nested4b!!!'},
                                      {'e4':'so_nested4c!!!'},
                                      {'e4':'so_nested4d!!!'},
                                      {'e4':'so_nested4e!!!'}]}]}}    

Flatten works great for this out of the box

df = mzm.flatten(d1,enumerate_types=(list,))
kv = sorted([(k,v) for (k,v) in df.items()])

(('a', 'b', 0, 'c1'), 'nested1!') (('a', 'b', 0, 'd1', 0, 'e1'), 'so_nested1!!!') (('a', 'b', 1, 'c2'), 'nested2!') (('a', 'b', 1, 'd2', 0, 'e2'), 'so_nested2!!!') (('a', 'b', 2, 'c3'), 'nested3!') (('a', 'b', 2, 'd3', 0, 'e3'), 'so_nested3!!!') (('a', 'b', 3, 'c4'), 'nested4!') (('a', 'b', 3, 'd4', 0, 'e4'), 'so_nested4a!!!') (('a', 'b', 3, 'd4', 1, 'e4'), 'so_nested4b!!!') (('a', 'b', 3, 'd4', 2, 'e4'), 'so_nested4c!!!') (('a', 'b', 3, 'd4', 3, 'e4'), 'so_nested4d!!!') (('a', 'b', 3, 'd4', 4, 'e4'), 'so_nested4e!!!')

d2 = {}
for key_value in kv:
    k = key_value[0]
    v = key_value[1]
    nested_set_dict(d2,k,v)

Gives

d1 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}

d2 =

{'a': {'b': [{'c1': 'nested1!', 'd1': [{'e1': 'so_nested1!!!'}]}, {'c2': 'nested2!', 'd2': [{'e2': 'so_nested2!!!'}]}, {'c3': 'nested3!', 'd3': [{'e3': 'so_nested3!!!'}]}, {'d4': [{'e4': 'so_nested4a!!!'}, {'e4': 'so_nested4b!!!'}, {'e4': 'so_nested4c!!!'}, {'e4': 'so_nested4d!!!'}, {'e4': 'so_nested4e!!!'}], 'c4': 'nested4!'}]}}
ianlini commented 5 years ago

Thanks for the advice. This is doable, but we need some design to make this general and intuitive enough. My first thought is adding a parameter list_index_types to define when to create list. If the splitter function generates a tuple with an element with a type in list_index_types, then we can look that element as a list index. If an index i doesn't exist and there is some index bigger than i, I think making that element to be None is better.

KoreyPeters commented 4 years ago

This would be a useful feature to me as well. It seems natural that the flatten/unflatten process should produce the same output as input, but that is not the case now.

ysfchn commented 4 years ago

Sorry for reviving the issue, but any updates about it? Because I'm also flattering a dictionary that contains arrays, but unflattering it results with list indices that converted to dictionary keys.

ianlini commented 4 years ago

Because there is no further feedback about the design, and it seems to be the most requested feature, I will implement it according to my last comment. Not sure about the timing, maybe in 1 or 2 months.

I would like to emphasize this again: I'm not expecting flatten() or unflatten() to be invertible. I really want to make them invertible, but I couldn't figure out the way. If you think it's possible, then please kindly give me the idea. Otherwise, you can only expect that a may not equal to unflatten(flatten(a)) except that a has some constraints and you use correct arguments for flatten() and unflatten().

ysfchn commented 4 years ago

/test/0/example

If one of the keys in the path just contains a number, then it can take as a list and insert the object in the specified index (in this case 0), but yes, this can be a dictionary key too. Then maybe in the flatten() method, you can show list indices between a different character like this: /test/[0]/example, so when unflattering, it can know this is a list or not. But this will also affect keys that contains [ and/or ]

Then the only choice will be making these flatten() and unflatten() methods as class objects, and with keypath objects (that will contain the key path and it will have own properties and methods like getvalue() (to get the value of key path), etc), so it would be easier for you to implement new features maybe. Because as class objects will have their own properties, it will be much easier and readable (for us and for you) in the flatten and unflatten operations.

Sorry, I'm not experienced well in nested dictionaries and recursive stuff, because I know too how it is hard to deal with them, so I can only say these.

ianlini commented 4 years ago

@ysfchn , thanks for your suggestion. I have considered making the key as a special object. It is one of the most feasible idea in my mind. It's great to see that you have similar idea.

I have also considered making flatten() and unflatten() as methods of some class. I think it's not related to the keypath idea because they can be done separately. The benefit of making a class is that we don't need to worry about making corresponding arguments when calling unflatten() after flatten().

Anyway, one of the difficulties is that I don't really know how people use this library. I guess people use it very differently, and I actually only use this library in some simple way.

For example, If we make the key as a special object, then {"test/0": 1} cannot be unflatten to {"test": [1]} because "test/0" is not our special object. They should transform the dict into something like {KeyPath("test", ListIndex(0)): 1}. This design is very useful when the dict we want to unflatten() is always generated by flatten(), but I don't know whether unflattening {"test/0": 1} is also important. To be a general library, we might need to support both ways without making things complicated.

aneuway2 commented 4 years ago

Hi! I'm investigating switching to using this project from another dict flattening library and this is one of the missing features that I would need.

I was able to easily switch this out using the code that @benbowen provided, but it looks like 2 other use cases are missing:

def nested_set_dict(d, keys, value):
    # https://github.com/ianlini/flatten-dict/issues/8
    """Set a value to a sequence of nested keys

    Parameters
    ----------
    d : Mapping
    keys : Sequence[str]
    value : Any
    """
    assert keys
    key = keys[0]
    if len(keys) == 1:
        if type(d) == list:
            d.append(value)
        else:
            d[key] = value
        return

    # convert to int if it is a string digit
    if isinstance(keys[1], str) and keys[1].isdigit():
        keys[1] = int(keys[1])

    # the type is a string so make a dict if none exists
    if type(keys[1]) == int:
        if key in d:
            pass
        elif type(d) == list and type(key) == int:
            if not d:
                d.append([])
            if key == len(d):
                d.append([])
        else:
            d[key] = []
        d = d[key]
    elif type(key) == int:
        if (key + 1) > len(d):
            d.append({})
        d = d[key]
    else:
        d = d.setdefault(key, {})
    nested_set_dict(d, keys[1:], value)
flatten_dict.nested_set_dict = nested_set_dict

Example to unflatten:

{
    "hello.world.0.item.inside.0.0.again": False,
    "hello.world.0.item.inside.0.1.again": True,
    "hello.world.0.item.inside.1.0.andagain": 1,
    "hello.world.0.item.inside.1.1.andagain": 2,
}

Example to flatten:

{
    "data": [
        {
            "active": True,
            "conditions": {
                "field": "segment_group",
                "operator": "and",
                "value": [
                    [
                        {
                            "action": "include",
                            "segment_id": 94427
                        },
                        {
                            "action": "include",
                            "segment_id": 94431
                        }
                    ]
                ]
            },
        }
    ]
}
benbowen commented 4 years ago

I just saw this in my email with the at-mention. I'm so sorry for 1 year of silence! but I'm excited that others are looking and working on this. The flatten/unflatten is an essential step in a pipeline that I have to run. I'm transferring the "battle code" I wrote for this pipeline about a year ago to another developer and by then if not sooner we will check out your commits and others suggestions.

ianlini commented 4 years ago

@benbowen I'm planning to implement this next week.

whardier commented 3 years ago

If I can propose using the Ellipsis (python2.7) or ... (python3+) variable in place of array indexes.. that would indicate (at least with tuples formatting) that this is part of a list.

{'roles': [
    {'uuid': {'$uuid': '55e119ce-3b4f-11eb-adc7-00163e0987ed'}},
    {'uuid': {'$uuid': '55e11cee-3b4f-11eb-adc7-00163e0987ed'}}
]}
[(('roles', ..., 'uuid', '$uuid'), '55e119ce-3b4f-11eb-adc7-00163e0987ed'),
 (('roles', ..., 'uuid', '$uuid'), '55e11cee-3b4f-11eb-adc7-00163e0987ed')]

Or just use a type:

>>> (1,2,3,list,4,5,6)
(1, 2, 3, <class 'list'>, 4, 5, 6)
shivam-gaur-mox commented 3 years ago

Hey! Thanks for creating this library 🥇

Would like to 1 up this issue as well. I'm looking into using this for a project which could benefit from dict flattening / unflattening - but unfortunately I would need this feature (i.e. given a dictionary which contains arrays in one or more values - flattening then unflattening results should result in arrays not being converted to dictionaries).

ori-levi commented 3 years ago

Hey @ianlini, Sorry for bringing this up. I think I've found a solution inspired by JsonPath

You can represent the dict key split by any delimiter, but when it's come to lists, append to the key the index. something like this:

{
    "data": [
        {
            "active": True,
        },
        {
            "active": False,
        }
    ],
    "another-dict": {
        1: "a",
        2: "b"
    }
}

flatten_dict

{
    "data[0].active": True,
    "data[1].active": False,
    "another-dict.1": "a",
    "another-dict.2": "b",
}

What do you think of this solution?

I might implement this later this day and open a pull request to your library.

ianlini commented 3 years ago

Thanks @ori-levi, This might be a good starting point.

I tried to implement a general version for all kinds of splitters a few months ago, but I found that there are so many edge cases and different behaviors to decide. The edge cases make the behavior less intuitive and less general no matter how I design it. After thinking a lot of those cases, I had a concrete idea on the requirements, but I became very busy before finishing the implementation.

I knew JsonPath long time ago and use it a lot, but I seldom use flatten-dict. I am actually very curious about why people don't simply use JsonPath to access their dict if they only want to use a string as key to access it, so I didn't think in that direction. Anyway, I will be very happy if we can first have a reducer and splitter pair that can flatten a dict into your JsonPath format and unflatten it back. I will dig into making it more general or customizable in the future.

ori-levi commented 3 years ago

@ianlini I just finish do develop the suggested solution, with JSONPath. Note that only JSONPath is reverseable.

I reformatted my code and write some test and open pull request for this. I hope to do this before Sunday.

HoernchenJ commented 3 years ago

@ianlini & @ori-levi Hello, are there any updates to this planed feature?

transfluxus commented 7 months ago

Is there a fork, which does it?