Closed dagoltz closed 5 years ago
@dagoltz
The distinct function is implemented using Python's itertools.groupby.
https://docs.python.org/2/library/itertools.html#itertools.groupby
As per this documentation:
"The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list".
As such, the underlying Enumerable.groupby() function is also executing. As you can see, the grouped objects are iterated over and stored in a list.
` def group_by(self, key_names=[], key=lambda x: x, result_func=lambda x: x): """ Groups an enumerable on given key selector. Index of key name corresponds to index of key lambda function.
Usage:
Enumerable([1,2,3]).group_by(key_names=['id'], key=lambda x: x) _
.to_list() -->
Enumerable object [
Grouping object {
key.id: 1,
_data: [1]
},
Grouping object {
key.id: 2,
_data: [2]
},
Grouping object {
key.id: 3,
_data: [3]
}
]
Thus the key names for each grouping object can be referenced
through the key property. Using the above example:
Enumerable([1,2,3]).group_by(key_names=['id'], key=lambda x: x) _
.select(lambda g: { 'key': g.key.id, 'count': g.count() }
:param key_names: list of key names
:param key: key selector as lambda expression
:param result_func: transformation function as lambda expression
:return: Enumerable of grouping objects
"""
result = []
ordered = sorted(self, key=key)
grouped = itertools.groupby(ordered, key)
for k, g in grouped:
can_enumerate = isinstance(k, list) or isinstance(k, tuple) \
and len(k) > 0
key_prop = {}
for i, prop in enumerate(key_names):
key_prop.setdefault(prop, k[i] if can_enumerate else k)
key_object = Key(key_prop)
result.append(Grouping(key_object, list(g)))
return Enumerable(result).select(result_func)
`
I will have to change the underlying group_by implementation so that it is also non-executing in order to make distinct also non-executing which is an added benefit.
merged into master
Currently, "distinct" returns an Enumerable, but inside the function implementation, it calls "to_list" on the collection. This means it is immediately executing. This differs from the LINQ promise. The "distinct" function should only keep "state" until enumerated, and should not execute any enumeration before that.