viralogic / py-enumerable

A Python module used for interacting with collections of objects using LINQ syntax
MIT License
187 stars 24 forks source link

"distinct" should NOT be immediately executed #29

Closed dagoltz closed 5 years ago

dagoltz commented 5 years ago

Currently, "distinct" returns an Enumerable, but inside the function implementation, it calls "to_list" on the collection. This means it is immediately executing. This differs from the LINQ promise. The "distinct" function should only keep "state" until enumerated, and should not execute any enumeration before that.

viralogic commented 5 years ago

@dagoltz

The distinct function is implemented using Python's itertools.groupby.

https://docs.python.org/2/library/itertools.html#itertools.groupby

As per this documentation:

"The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list".

As such, the underlying Enumerable.groupby() function is also executing. As you can see, the grouped objects are iterated over and stored in a list.

` def group_by(self, key_names=[], key=lambda x: x, result_func=lambda x: x): """ Groups an enumerable on given key selector. Index of key name corresponds to index of key lambda function.

    Usage:
        Enumerable([1,2,3]).group_by(key_names=['id'], key=lambda x: x) _
            .to_list() -->
            Enumerable object [
                Grouping object {
                    key.id: 1,
                    _data: [1]
                },
                Grouping object {
                    key.id: 2,
                    _data: [2]
                },
                Grouping object {
                    key.id: 3,
                    _data: [3]
                }
            ]
        Thus the key names for each grouping object can be referenced
        through the key property. Using the above example:

        Enumerable([1,2,3]).group_by(key_names=['id'], key=lambda x: x) _
        .select(lambda g: { 'key': g.key.id, 'count': g.count() }

    :param key_names: list of key names
    :param key: key selector as lambda expression
    :param result_func: transformation function as lambda expression
    :return: Enumerable of grouping objects
    """
    result = []
    ordered = sorted(self, key=key)
    grouped = itertools.groupby(ordered, key)
    for k, g in grouped:
        can_enumerate = isinstance(k, list) or isinstance(k, tuple) \
            and len(k) > 0
        key_prop = {}
        for i, prop in enumerate(key_names):
            key_prop.setdefault(prop, k[i] if can_enumerate else k)
        key_object = Key(key_prop)
        result.append(Grouping(key_object, list(g)))
    return Enumerable(result).select(result_func)

`

I will have to change the underlying group_by implementation so that it is also non-executing in order to make distinct also non-executing which is an added benefit.

viralogic commented 5 years ago

merged into master