usethesource / capsule

The Capsule Hash Trie Collections Library
BSD 2-Clause "Simplified" License
404 stars 27 forks source link

Introduce `Iterator.seek(int next)` as a basic building block for faster composition and transitive closure #37

Open jurgenvinju opened 10 months ago

jurgenvinju commented 10 months ago

The seek method is described as follows, here https://arxiv.org/abs/1210.0481 :

seek(int seekKey): Position the iterator at a least upper bound for seekKey, i.e. the least key ≥ seekKey, or move to end if no such key exists. The sought key must be ≥ the key at the current position.

Using this building block algorithms for trie compose ("join") and closure can make use of the orderedness of the hash keys:

This seek method can be used in many applications and it abstracts from the internal details of the data-structure. All it depends on is the hashCode/equals contract. Most implementation in capsule will use the structure of the trie to make sure seek is done as quickly as possible. It would be best if we implement seek for all tries in capsule, IMHO.

jurgenvinju commented 10 months ago

Caveat: If our iterators go in the other direction (I don't know) wrt to hash code order, then of course the definition of seek should be mirrored as well.

If our iterators do not respect hash code order, then we have to think carefully about the semantics of seek. Is it still correct to assume we can skip large parts of the trie, even though the order is non-standard?