lacuna / bifurcan

functional, durable data structures
MIT License
967 stars 51 forks source link

Some List RRB trees constructed deeper than necessary #17

Open jafingerhut opened 5 years ago

jafingerhut commented 5 years ago

So far I do not know that this is a correctness issue, only a minor performance issue from evidence I have now. Starting from an empty list and appending elements, some trees are constructed with 1 extra level of depth/height than necessary.

$ clj "-J-Dclojure.server.repl={:port 50505 :accept clojure.core.server/repl}" -Sdeps '{:deps {io.lacuna/bifurcan {:mvn/version "0.2.0-alpha1"}}}'
(import '(io.lacuna.bifurcan List))

;; (capac s) is the maximum number of vector elements that can be
;; descendants beneath a tree node with shift s.
(defn capac [shift]
  (bit-shift-left 1 (+ shift 5)))

;; make private root field accessible
(def list-class (class (List.)))
(def list-root-field (.getDeclaredField list-class "root"))
(.setAccessible list-root-field true)

(defn get-root [v]
  (.get list-root-field v))

(defn listnode? [x]
  (instance? io.lacuna.bifurcan.nodes.ListNodes$Node x))

(defn descendant-depths [n]
  (loop [remaining-nodes [[0 n]]
         depth->count {}]
    (if (zero? (count remaining-nodes))
      depth->count
      (let [[depth node] (peek remaining-nodes)
            depth->count (update depth->count depth (fnil inc 0))
            children (if (listnode? node)
                       (take (.numNodes node) (seq (.nodes node))))]
        (recur (into (pop remaining-nodes)
                     (map (fn [x] [(inc depth) x]) children))
               depth->count)))))

(defn list-n [n]
  (reduce (fn [v e] (.addLast v e))
          (List.)
          (range n)))

(def n1 (+ (capac 15) (- (capac 10)) (capac 5) (capac 0)))
(def pv (list-n n1))
(descendant-depths (get-root pv))
;; {0 1, 1 1, 2 32, 3 994, 4 31777}

For a List with fewer than (capac 15) elements, I would expect a tree with shift 15 at the root, but this is the smallest value of n I found that causes the tree to go to shift 20 at the root, a bit too early.

As far as I can tell so far this is not a correctness issue -- nth returns the correct value for all indices, for example. It is a performance issue.

I have not tried changes to the code yet to modify this behavior, but I believe it is because the condition to add a new level of depth to the tree in method pushLast only checks that the bottom-most node has the maximum branching factor, and the root node does, but not that all of the other nodes on the "right tree edge" have the maximum branching factor. Thus the code can add a new level of depth to the tree when an intermediate depth node on the "right tree edge" has only 1 node.

(def wrong-count (atom 0))
@wrong-count
(dotimes [i n1]
  (when (not= (.nth pv i) i)
    (swap! wrong-count inc)))
@wrong-count
;; 0 - good