hfour / yads

An array-like data structure with a monoidal cache
8 stars 4 forks source link

Fix optimization TODO in TreeUtils::distribute #3

Closed bor0 closed 5 years ago

bor0 commented 6 years ago

Don't use recursion due to lack of TCO.

skid commented 5 years ago

Also - wow. Sorry for not looking at this in such a long time :)

bor0 commented 5 years ago

No worries, but I don't have the fork anymore to do the necessary changes. Can you try this diff?

diff --git a/src/tree-utils.ts b/src/tree-utils.ts
index 7ef74d4..ef4694b 100644
--- a/src/tree-utils.ts
+++ b/src/tree-utils.ts
@@ -10,39 +10,37 @@ export const Size: MonoidObj<number> = Object.freeze({
 });

 /**
- * A not-so-efficient way to create a tree from arrays.
- * TODO: Optimize.
+ * Create a tree from array.
  */
 function distribute<T>(nodes: BaseNode<T>[]): INode<T> {
-  const len = nodes.length;
-
-  if (len === 0) {
+  if (nodes.length === 0) {
     return new INode();
   }

-  if (len === 1) {
-    if (nodes[0] instanceof INode) {
-      return nodes[0] as INode<T>;
-    } else {
-      return new INode(nodes[0]);
-    }
-  }
-
-  const result: INode<T>[] = [];
+  while (1 != nodes.length) {
+    const result: INode<T>[] = [];
+    const len = nodes.length;

-  for (let i = 0; i < len; i += 2) {
-    if (i + 3 === len) {
-      result.push(new INode(nodes[i], nodes[i + 1], nodes[i + 2]));
-      break;
-    } else if (i + 1 === len) {
-      result.push(new INode(nodes[i]));
-      break;
-    } else {
-      result.push(new INode(nodes[i], nodes[i + 1]));
+    for (let i = 0; i < len; i += 2) {
+      if (i + 3 === len) {
+        result.push(new INode(nodes[i], nodes[i + 1], nodes[i + 2]));
+        break;
+      } else if (i + 1 === len) {
+        result.push(new INode(nodes[i]));
+        break;
+      } else {
+        result.push(new INode(nodes[i], nodes[i + 1]));
+      }
     }
+
+    nodes = result;
   }

-  return distribute(result);
+  if (nodes[0] instanceof INode) {
+    return nodes[0] as INode<T>;
+  } else {
+    return new INode(nodes[0]);
+  }
 }

 export function fromArray<T>(data: T[]) {
skid commented 5 years ago

I will try it out, but what's the base file? Your previous pull request, or the original "non efficient" master?

BTW, I thought of an analytical way to construct a tree from an array which is "optimally balanced". By this, I mean, the same tree that you would end up if you just kept pushing nodes one by one. This should be fast because you know upfront how many INodes and Leaves each node in the tree will have only from the array length.

Let N = input array length
Let P = 2^X, where P < N and 2^(X+1) > N (largest power of two smaller than N)
Let H = P/2
Let D = N - P

The depth of the tree will always be X.

By definition D can't be larger than P.
If D = P, then N is an exact power of 2, and the tree is binary.
Id D < P, then:
  if D < H, the root has two children:
    Child A has a total of H leaves
    Child B has a total of H +D leaves
  if D = H, the root has three children, each one with H nodes
  if D > H, the root has three children:
    Child A has a total of H nodes
    Child B has a total of H nodes
    Child C has a total of D nodes

This basically constructs a tree which is skewed to the right. As the rightmost nodes get filled with more leaves they split, and that split works up recursively when enough leaves are added. For example.

Another way to put it is that each INode can have between 2^D and 3^D leaves, where D is the INode's depth. 2^D is ideal because it's sparse and inserts won't trigger many rebalances, and 3^D is too tight and should be avoided. The balancing heuristic is that when an INode gets filled up to 2^(D+1) it splits and the depth of the tree increases. Since 2^(D+1) is always less than 3^D, we never get to the tight situation.

Input = Array(15)
Depth = 3 (2^3 = 8, but 2^4 = 16)

Root has 3 nodes with sizes 4, 4, and 7
The "7" node has 3 nodes: 2, 2 and 3
The "3 node has only leaves.

Now, pushing a node to the tree will split the "3" node, which will lead to splitting the "7" node, which will lead to splitting the root.

Hope this makes sense - i will program it when I got some time. BTW, it's an O(N) complexity - actually 2N

bor0 commented 5 years ago

I believe I based it off of latest master :)

skid commented 5 years ago

@bor0 - here's the changed implementation of fromArray():

https://github.com/hfour/yads/blob/0e4f89023b593eed4a2506fda1758e8afadd0008/src/tree-utils.ts#L12-L40

bor0 commented 5 years ago

@skid I just skimmed through your explanation/code and as far as I can tell it's doing the same thing as previously, with same complexity.

In the previous code we did i += 2 so it was eagerly creating nodes of length 2 until remainder.

So it is the same algorithm, rewritten in a recursive fashion. It is much more readable now, but not suited for a non-TCO supporting language imo.

skid commented 5 years ago

@bor0. Fair. The new one runs quite a bit faster because no local variables are created and no extra assignments are made, but it's still essentially the same code.

I'll try out your diff next.

BTW - not sure this is a case for TCO - I still wait for the recursive calls to return before passing the results to the INode constructor.

bor0 commented 5 years ago

Closing stale PR. It still can be used as a reference if needed.