The result_func from group_by is acting up

haliaga commented 4 years ago

The code below prints the correct result, when a breakpoint is inserted on "func": ({'value': 1}, {'key': "{'id': 1}", 'enumerable': '[1, 1, 1]'}) ({'value': 2}, {'key': "{'id': 2}", 'enumerable': '[2, 2]'}) ({'value': 3}, {'key': "{'id': 3}", 'enumerable': '[3]'}) ({'value': 0}, {'key': "{'id': 0}", 'enumerable': '[]'}) If a breakpoint is not planted or better no func(r) but r is used, the "enumerable" comes empty.

code:

def func(res): return res

def _010_group_join(): e1 = Enumerable([ {'value': 1}, {'value': 2}, {'value': 3}, {'value': 0} ]) e2 = Enumerable([1, 2, 3,1,2,1]) res = e1.group_join(e2, outer_key=lambda x: x['value'], inner_key=lambda y: y, result_func=lambda r: func(r)).to_list() for e in res: print(e) print('end')

viralogic commented 4 years ago

Thank you for the feedback. I will try to replicate this issue when I get a chance.

viralogic commented 3 years ago

@haliaga Apologies for the long delay. Just investigating this issue now

viralogic commented 3 years ago

@haliaga

First of all, wanted to say thank you for submitting this issue. Investigation into this issue identified a bug in the group_join method that I failed to notice earlier and I incorporated your code as a unit test into this project.

The issue with your code is in your result_func lambda function used in the group_join method. The end result of the group_join operation is a tuple with the format of (element, Grouping), where element is an element from your outer collection of the group_join and Grouping is an Enumerable that contains an iterator of inner collection elements that match the inner_key(inner_element) == outer_key(outer_element) predicate. In your case, your code should look like this unit test I recently added:

e1 = Enumerable([{"value": 1}, {"value": 2}, {"value": 3}, {"value": 0}])
        e2 = Enumerable([1, 2, 3, 1, 2, 1])
        res = e1.group_join(
            e2,
            outer_key=lambda x: x["value"],
            inner_key=lambda y: y,
            result_func=lambda r: (r[0], r[1].to_list()),
        )

self.assertListEqual(
            [
                ({"value": 1}, [1, 1, 1]),
                ({"value": 2}, [2, 2]),
                ({"value": 3}, [3]),
            ],
            res.to_list(),
)

As you can see, the result_func in the above code explictly calls to_list() on the Grouping instance in the tuple fed into the result_func lambda function. This is necessary when using the result_func due to how the groupby iterator is implemented in the python itertools module. To quote:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list

Another example of this is available on the documentation.

viralogic / py-enumerable

The result_func from group_by is acting up #52