Open ereOn opened 8 years ago
You are correct. Locally I get these results:
$ python benchmark.py
0.939733028412 default sets - chained
1.01636219025 default sets - interleaved
7.32243204117 sorted sets - chained
7.37890100479 sorted sets - interleaved
Merging sorted sets appears 7-8x slower. The performance between chained and interleaved values is nearly identical.
Let's look at why. The code for SortedSet.union
is:
def union(self, *iterables):
"""
Return a new SortedSet with elements from the set and all *iterables*.
"""
return self.__class__(chain(iter(self), *iterables), key=self._key, load=self._load)
__or__ = union
__ror__ = __or__
So a new SortedSet is constructed by chaining self
and iterables
together. Iterating self
is very fast but let's measure:
print timeit.timeit(
setup='''
a = set(xrange(0, 5000))
''',
stmt='''
list(a)
''',
number=10000,
), "default sets - iteration"
print timeit.timeit(
setup='''
from sortedcontainers import SortedSet
a = SortedSet(xrange(0, 5000))
''',
stmt='''
list(a)
''',
number=10000,
), "sortedcontainers.SortedSet - iteration"
And that takes:
0.296840906143 default sets - iteration
0.444090127945 sortedcontainers.SortedSet - iteration
So iterating self
and iterables
will take 50% longer than native sets and iterating both will take about 90% of constructing a new native set.
When a new SortedSet
is initialized, a native set is created and then a SortedList
is initialized from that. Since native sets use hashing, they could shuffle the sort order. Let's check:
In [9]: it1 = iter(set(xrange(0, 5000)) | set(xrange(5000, 10000)))
In [10]: it2 = iter(it1)
In [11]: all(first < second for first, second in zip(it1, it2))
Out[11]: True
That's kind of surprising to me. But given the design of sets I guess it's not too surprising. The iterated union of "chained" native sets is sorted. What if the native sets are interleaved:
In [12]: it1 = iter(set(xrange(0, 10000, 2)) | set(xrange(1, 10000, 2)))
In [13]: it2 = iter(it1)
In [14]: all(first < second for first, second in zip(it1, it2))
Out[14]: True
I'm impressed the iteration order is still sorted. The iterated union of "interleaved" native sets is also sorted.
That means the cost of creating a SortedSet
is:
SortedList
with values already in sorted orderSo we must be slower than native set union because SortedSet
calls the native set constructor. That means the majority of slowdown comes from creating SortedList
. The SortedList
constructor is just a call to sorted
followed by list slicing. How much slower is sorted
than constructing a list
?
In [15]: %timeit sorted(xrange(10000))
1000 loops, best of 3: 200 µs per loop
In [16]: %timeit list(xrange(10000))
10000 loops, best of 3: 74.8 µs per loop
Calling sorted
is ~3x slower than calling list
. Slicing the result into list segments will require another iteration. So the sources of slowdown:
SortedSet
is slower than iterating a native set. The larger your sets get then the less this will be the case.SortedSet
has to construct a native set which will take at least as long as the native set union operation.SortedSet
creates a SortedList
which calls sorted
. The sorted
built-in function has to compare values and so calls __lt__
for every value in the list. That operation is slower than __hash__
for historical reasons in Python. (Python's history of rich comparison operators has made the C code fairly complex.)sorted
the result is chopped up into segments which is another linear pass over the data.Before we draw conclusions, let's increase the size of the sets to avoid slowdowns based on one-offs like calling chain
or iter
. For sorted sets 100x larger, here's the performance:
In [21]: %%timeit a = SortedSet(xrange(0, 500000)); b = SortedSet(xrange(500000, 1000000))
a | b
....:
10 loops, best of 3: 107 ms per loop
In [22]: %%timeit a = set(xrange(0, 500000)); b = set(xrange(500000, 1000000))
a | b
....:
10 loops, best of 3: 25.7 ms per loop
Now SortedSet
is ~316% slower than native set, or 3x slower. That 3x slowdown corresponds to three linear passes over the values:
__hash__
once on every value.sorted
built-in: linear pass that calls __lt__
once for every value.sorted
result into segments: linear memcopy-like pass.I would guess that sorted
is the slowest part of the process. To improve performance we would need to call __lt__
less often. And we can do that because we know the other values are already sorted. Here's how we could refactor union
(warning: not thoroughly tested):
print timeit.timeit(
setup='''
from sortedcontainers import SortedSet
class FasterSortedSet(SortedSet):
def union(self, *iterables):
if (self and len(iterables) == 1
and isinstance(iterables[0], SortedSet)
and iterables[0]
and self._list._lists[-1][-1] < iterables[0]._list._lists[0][0]):
that = iterables[0]
result = FasterSortedSet()
_set = result._set
_set.update(self)
_set.update(that)
_list = result._list
_list._lists.extend(list(sublist) for sublist in self._list._lists)
_list._maxes.extend(self._list._maxes)
_list._lists.extend(list(sublist) for sublist in that._list._lists)
_list._maxes.extend(that._list._maxes)
_list._len = len(_set)
return result
else:
return self.__class__(
chain(iter(self), *iterables),
key=self._key,
load=self._load
)
__or__ = union
__ror__ = __or__
a = FasterSortedSet(xrange(0, 5000))
b = FasterSortedSet(xrange(5000, 10000))
''',
stmt='''
c = a | b
''',
number=10000,
), "FastSortedSet - chained"
When I run that the result is:
0.900471925735 default sets - chained
2.85443711281 FasterSortedSet - chained
6.6682741642 SortedSet - chained
I think that's as fast as we can get. FasterSortedSet
is ~2.3x faster than SortedSet
in this scenario but it did so by changing the internals of SortedList
. Changing SortedList
internals introduces tight coupling between the data types which I try to avoid. I'm not convinced this is necessary but I'm interested to hear more about your use case.
It seems to me that this can be written as the general case of performing an operation with an "already sorted by my key" container as an argument. If all sorted containers advertised that they were sorted via a "sorted" attribute containing their key or None, then the internals of the SortedList can merely check that "sorted" attribute, and proceed without re-sorting, and not knowing too much about whether it's part of a set union.
In other words, a minor change to the "update" function in the SortedList, skipping the sort if the hasattr(iterable,"sorted") and iterable.sorted==self.key.
(I've always wanted a "sorted" flag to be a first class container citizen... with the sorted() function automatically setting this flag to the key used, and any unknown operations clearing it.)
The "sorted" attribute is kind of already present given the type. Good catch on checking the key function as well. I could expose the key function as a readonly attribute. Then you could simply check: "isinstance(other, SortedList) and other.key == self.key"
There has also been some discussion of adding Sorted as a kind of collections ABC. If you work on a proposal then I would be glad to contribute. Andrew Barnert also had some thoughts: http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html
I was also thinking that a SortedSet.fromset
classmethod could be a more useful API than the current _set
parameter used to initialize sorted sets. Maybe SortedList.fromsorted
is also warranted and would assume the input is sorted.
Couple updates:
1) sorted list, sorted dict, and sorted set all expose a "key" attribute now which could be used for the comparison checking.
2) The sorted set data type now has "_fromset" as described above. But sorted list does not have "_fromsorted". That's still an interesting idea.
I ran a simple benchmark to compare the merge speed between
SortedSet
and a regular (Python 2.7)set
. I expected the merge of sorted sets to be faster but it turns out it's dramatically slower. This seems weird to me, especially in the "chain" case where I'd expect the merge to be a simple concatenation as the min and max of each do not overlap.Here is my benchmark and its results:
Which gives:
Could you confirm/infirm my expectation regarding these results ?
Thanks.