Open ThomasShaw1 opened 2 years ago
Here's another hacky way of doing it. Not sure if it is better in terms of performance. But still, an API would be nice.
def play():
sd = SortedDict({1: 1, 2: 2, 3: 3, 4: 4, 5: 5})
low = sd.bisect_left(2)
high = sd.bisect_left(4)
for key in sd._list[low:high + 1]:
super(SortedDict, sd).pop(key)
del sd._list[low:high + 1]
print(sd)
Having a kind of range_delete makes sense to me. There’s a variety of optimizations that could be done inside the data structure too.
Note that today you can use irange instead of bisect to avoid using the position index.
Here's the best I can come up with today (without internal optimizations):
$ python -m IPython
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec 7 2020, 12:10:52)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sortedcontainers
In [2]: class SortedDict(sortedcontainers.SortedDict):
...: def range_del(self, minimum=None, maximum=None, inclusive=(True, True)):
...: iterator = self.irange(minimum, maximum, inclusive)
...: keys = list(iterator)
...: for key in keys:
...: del self[key]
...:
In [3]: sd = SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
In [4]: sd
Out[4]: SortedDict({'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5})
In [5]: sd.range_del('b', 'd')
In [6]: sd
Out[6]: SortedDict({'a': 1, 'e': 5})
This is great in terms of simplicity. But I would still urge you to take a look at the hack implementation I wrote above. The main idea is that slice deletion for SortedList is already internally optimized to O(log n), which is used for keys, so why not just use it as it is and then take care of the dictionary deletion one by one. That way, the time complexity is O(k + log n) instead of O(k log n), which is inline with Sorted Maps in Java and C++ and other languages.
The performance of the two will differ depending on the size of the dictionary and how much is deleted. But I generally agree that an optimized version would be a good addition to the API.
The best improvement to your version would be to avoid building the positional index. If you like at the implementation of SortedList.irange, you can see how to do so.
But how would I achieve slice deletion if I don't have the positional indices? SortedList.__delitem__
only takes index or slice. It is only through slice deletion that we achieve internal optimization.
Hello, I'm currently facing the exact same problem and I was wondering whether any of you (@ThomasShaw1) had found some optimized solution ? I've tried the range_del @grantjenks wrote but it doesn't seem really optimized as it doesn't accelerate the deletion.
Best regards
Range key deletion feature is currently missing for
SortedDict
. If I want to delete multiple keys within an range, I have to do them one by one using a for loop which is very inefficient in terms of time complexity. See my example below.This is very inefficient both in terms of usage and in terms of time complexity. Time complexity for this one is O(k log n) where k is the range size. Other languages such as C++ and Java only needs an time complexity of O(k + log n) to accomplish the job with the API mentioned below.
In C++, you have the option of calling
erase(iterator first, iterator last)
for theirmap
data structure. Similarly, in Java, you can callsubMap(first, last).clear()
for theirTreeMap
data structure. Something similar should be available forSortedDict
in my opinion.