replikativ / hitchhiker-tree

Functional, persistent, off-heap, high performance data structure
Eclipse Public License 1.0
44 stars 19 forks source link

Fix insertion order bug with IKeyCompare extension #22

Closed ajoberstar closed 3 years ago

ajoberstar commented 3 years ago

As noted in replikativ/datahike#122 and replikativ/datahike#258 there are cases where the insertion order of elements into trees differs from the order used during lookups. This causes issues where items can be present in the tree, but not findable via the lookup functions.

The cause is def'ing empty-sorted-map-by-compare at namespace load. When any later extensions to the IKeyCompare protocol are made, they alter the -compare var (normal behavior of Clojure protocols). This means Clojure code calling -compare sees the new behavior, but the empty-sorted-map-by-compare has already instantiated a PersistentTreeMap with the -compare method as defined at namespace load, ultimately only what was defined in hitchhiker.tree.key-compare.

Datahike specifically extended IKeyCompare to handle vectors, since its indices are vectors of various arrangements of datom components. In particular this behavior is noticeable in the avet index, where you try to locate an entity by attribute and value. The values are highly sensitive to IKeyCompare's ordering, since they have far more variable types than the attributes or entity IDs.

It may make sense for hitchhiker-tree to also handle collection sorting out-of-box, since lack of this is makes sorting of collections with elements IKeyCompare supports to sort differently than the raw values do.

Looking at the upstream hitchhiker-tree [1], they were calling sorted-map-by for each node creation, so I'm guessing it's not too performance sensitive.

[1] https://github.com/datacrypt-project/hitchhiker-tree/blob/f7d0f926541d7cb31fac11c2d2245c5bac451b86/src/hitchhiker/tree/core.cljc#L618

whilo commented 3 years ago

Thanks a lot for providing this! I will take a look.

whilo commented 3 years ago

@mpenet suggested this as a fix (def empty-sorted-map-by-compare (sorted-map-by #'c/-compare)), but I still have a few other things to do and need to check the effect on the benchmarks for this additional deref.

TimoKramer commented 3 years ago

closing this in favor of #24