grantjenks / python-sortedcontainers

Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set
http://www.grantjenks.com/docs/sortedcontainers/
Other
3.51k stars 203 forks source link

Feature request: parameterize SortedList used in SortedDict through inheritance #204

Open Serpens66 opened 2 years ago

Serpens66 commented 2 years ago

Hi, thanks for your work :)

would it be possible to add a new function to the SortedDict (and therefore also SortedList) which sets a key a new value and returns the index position in one go? Eg.: index = sorteddict.setitem_returnindex(key,value)

Usecase: I want to add a new key,value to the SortedDict and I want to know at which total index this key was sorted in. Since performance is important, I assume that such a combined method would be much faster than sorting it in and then again search the key to get the index.

I assume that when sorting the new key into the SortedDict, we already search for the correct index to insert it. So we should be able to simply return this index and then there is no need to search for it twice. I already read the source code and tried to add it myself, but: 1) sortedcontainers is still frequently updated, I don't want to overwrite eg. the init of SortedDict to make it use a custom SortedList, because this might mess with future updates. 2) I did not fully understand the internal structure with lists and maxes yet :D And it seems bisect "insort" also does not return the index, which makes it more difficult =/ (I really don't understand why "insort" does not return the index, it would be so easy -.- )

grantjenks commented 2 years ago

See #195

Serpens66 commented 2 years ago

I created an issue: https://github.com/python/cpython/issues/96571 Maybe you can vouch for it, to make it more likely to happen.

In the meantime you still may adjust sortedcontainers I guess? Just do sth like: index = insort(...) return index

This will be None if the python source code will not change, just like it currently is, but automatically gets the index as soon python changes it. And in case your add(...) function does not use insort, you are already able to return the index, so this would already be an improvement if the index is 0 or the len().

Serpens66 commented 2 years ago

If you want to add sth to the cpython issue, please let us know.

If not, it seems the devs there convinced me that there is no urgent need to change "insort" to return the index, because doing bisect and then list.insert(...) one after the other would be the same speed... If this is true, then as you wrote in issue 195, the user can add a custom "add" function on their own by inheriting from SortedList. ... But to achieve the wanted result for SortedDict, I still need to exchange the internal SortedList with my custom SortedList. And this is currently only possible by overwriting either the init of SortedDict or replace the self._list after initialization.. while both of it may easily break if some code from SortedDict changes. Do you know a proper solution?

grantjenks commented 2 years ago

In #195, I mentioned "The other problem ..." but the first problem still remains (and is the bigger issue):

"""Getting the index of the item requires the positional index to be built which add() specifically avoids. I think this is a fine example for simply inheriting and implementing the needed functionality in your own code."""

I still need to exchange the internal SortedList with my custom SortedList.

This is a good point. SortedDict is not parameterized on the SortedList type. I don't think I'd want to make it an argument to __init__, but I would consider making it a class var so the reference in __init__ became self.SortedList(...). Then inheriting would mean simply changing the class variable. Something like:

from .sortedlist import SortedList

class SortedDict:
    SortedList = SortedList

    def __init__...:
        self._list = self.SortedList(...)

and your code would do:

class MySortedDict(SortedDict):
    SortedList = MySortedList
Serpens66 commented 2 years ago

Thank you. I never used this "define variables in classes outside of a method" yet, but if it works than this is a good solution, thanks :)

What do you think how much impact it will have to exchange the insort call with bisect_right and list.insert and returning the index in my custom sortedlist? Like 10% slower or more/less? Maybe you already made some tests regarding this and therefore decided against it.