Open pniederw opened 8 years ago
Hi Peter, sets/maps with insertion order are well in scope for capsule. Last summer I started to explore designs of such data structures for another project. At the moment, implementing insertion order is not my top priority, thus it will take some time until I'll add them.
Eventually I'm also going to add lists (likely something similar to RRB Trees).
The next data structures that I'll add to capsule will be sets/maps that can mix primitive elements with reference elements in a memory efficient way. The reasoning behind is to provide the best of specialized primitive collections and of generic collections in a single implementation (planned timeline: end of march).
I'll add milestones for what's missing / planned. Thanks for voicing your interest.
Sounds great! Looking forward to all of this.
https://github.com/usethesource/capsule/blob/master/capsule-experimental/src/main/java/io/usethesource/capsule/experimental/ordered/OrderedTrieMap.java#L150
Interesting idea with sequence ids. What happens when nextSequenceId
overflows?
Hi @ilya-g, thanks for asking. The OrderedTrieMap
implementation doesn't implement overflow handling yet, however I'll sketch the idea: in the constructor you can check if size reaches MAX_INT
, if this happens you need to traverse the tree in insertion order and reassign subsequent IDs to each item (to essentially get rid of holes in the current sequence IDs).
Rewriting the tree for reassigning IDs is a costly operation, however shouldn't occur in practice often. The scenario where you'd encounter it is if you have really large collections and perform plenty of alternating inserts and deletes (because only deletes create holes in the sequence IDs). Other operations such as put
do not update the sequenceId
of a tuple.
If you like to use the OrderedTrieMap
, give it a try. Apart from the missing overflow handling (that I'm willing to add as well) it's a fully functioning ordered map. I implemented a similar version of this data structure for a closed source project some time ago.
Note, the OrderedTrieMap
is currently located in the capsule-experimental
module, but with some integration work I'd be able to move it to the main capsule
module in the near future.
Another option is to use Long
for sequenceId – it hardly overflows in a hundred years. 😀
The thing that concerns me though is that one has to pay O(n∙log(n))
cost for sorting entries before iteration, and in some scenarios it can happen every time iteration is started (i.e. you iterate collection once and then modify it somehow).
Have you considered keeping an immutable sorted map of sequenceId=>entry
and therefore spreading that cost among modification operations?
Sorry for the late response. To answer your questions: I wouldn't like to use Long
because it introduces to much overhead in my opinion.
Regarding sorting before iteration: this is all about implementation choices. To avoid sorting at all, one could use a reversed map as you pointed out correctly as well. The other solution would be to sort lazily upon the first time of iteration (as done now), and from then on update the reverse index incrementally.
Will capsule also provide sets/maps that maintain insertion order, or is this out of scope? And will it provide lists? Sorry for asking these questions here, didn't know where else to ask.