koka data structures - Githubissues

SchrodingerZhu commented 3 years ago

I am trying to write a finger tree in koka for https://github.com/SchrodingerZhu/koka-collections, following the template of https://github.com/liuxinyu95/AlgoXY/blob/algoxy/datastruct/elementary/sequence/src/FingerTree.hs.

fun tree ( f: list<node<a>>, m: tree<node<node<a>>>, r: list<node<a>> ): div tree<node<a>> {
    match ( f, m, r ) {
        (_, Empty, Nil) -> f.foldr(Empty, cons')
        (Nil, Empty, _) -> r.foldr(Empty, cons')
        (Nil, _, _) -> {
            match ( m.uncons' ) {
                (Just(Branch(_, xs)), m') -> tree(xs, m', r)
                _                         -> Empty // this branch is not valid
            }
        }
        (_, _, Nil) -> {
            match (m.unsnoc') {
                (m', Just(Branch(_, xs))) -> tree(f, m', xs)
                _                         -> Empty // this branch is not valid
            }
        }
        _ -> Tree (f.list-size + m.size + r.list-size, f, m, r)
    }
}

I noticed that some function like the above is derived to have a divergence side effect though if the data structure is properly written, such functions should never diverge. Is it possible to workaround such situations.

BTW, what should I pay especially attention to in order to utilize the FBIP optimization?

venus-as-a-boy commented 3 years ago

Knowing generally if a recursive function is divergent is impossible. That being said for cases where we can prove that it's not convergent we can only get full type safety for all such functions in a language with a proof system (i.e. idris). As for whether this function can be rewritten to not be divergent, I'm not 100% sure, but my guess would be no.

daanx commented 3 years ago

Interesting -- make sure to share your results :-)
The compiler can only prove a direct recursive function can terminate if you match explicitly on an inductive datatype; I think it does not work here because you use functions like uncons and unsnoc so the compiler does not "see" that size(m') < size(m). Koka is on purpose not "too smart" about proving non-divergence as we want it to be predictable (like type checking). (However, you can use unsafe-decreasing(m') to assert that m' is smaller in the recursive calls... but better be sure it is correct :-))

SchrodingerZhu commented 3 years ago

Move to #104

type digit<a> {
    con Zero
    con One   (a: a)
    con Two   (a: a, b: a)
    con Three (a: a, b: a, c: a)
    con Four  (a: a, b: a, c: a, d: a)
}
type split<c, a> {
    con NotFound
    con Split(front: c<a>, mid: a, rear: c<a>)
}
fun split(digit: digit<a>, index: i): split<digit, a> {
    match ((digit, index)) {
        (One(x), 0) -> Split(Zero, x, Zero)
        (Two(x, y), 0) -> Split(Zero, x, One(y))
        (Two(x, y), 1) -> Split(One(x), y, Zero)
        (Three(x, y, z), 0) -> Split(Zero, x, Two(y, z))
        (Three(x, y, z), 1) -> Split(One(x), y, One(z))
        (Three(x, y, z), 2) -> Split(Two(x, y), z, Zero)
        (Four(a, b, c, d), 0) -> Split(Zero, a, Three(b, c, d))
        (Four(a, b, c, d), 1) -> Split(One(a), b, Two(c, d))
        (Four(a, b, c, d), 2) -> Split(Two(a, b), c, One(d))
        (Four(a, b, c, d), 3) -> Split(Three(a, b, c), d, Zero)
        _ -> NotFound
    }
}

Yields me

(version 2.0.12, Dec  2 2020, libc 64-bit (gcc))
failure while running c:
  *** internal compiler error: Backend.C.ParcReuse.patAddNames

daanx commented 3 years ago

Fixed in #104 :-)

SchrodingerZhu commented 3 years ago

@daanx Hi, I have finished the fingertree experiment; but the results aren't satisfactory: Koka:

heading 1000000:  0.076s
tailing 1000000:  0.085s
access 10000000:  19.981s

Scala:

heading 1000000: 0.006597249
tailing 1000000: 0.051602835
access 10000000: 0.871750356

Haskell

Computation time: 2.281 sec

Scala uses a radix balanced fingertree (a very special data structure, may take a while to implement: https://github.com/scala/scala/pull/8534); therefore it is expected to be fast; In Haskell, thou I didn't apply random access, the time difference is already significant enough.

This result is little bit wield to me: yes, the tree is implemented in strict environment, but look-up operation should even be faster it the tree is strict evaluated. Perhaps I will need some external helps to figure out what is going on here.

The code follows:

import collections/fingertree
import std/num/random
import std/time/timer

fun create(n: int, k: seq<int>): <div, ndet> seq<int> {
    if (n <= 0) 
    then k
    else create(n - 1, k.cons(n))
}

fun heading(n: int, k: seq<int>, last: int) {
    if (n > 0) {
        heading(n - 1, k, k.head.mbint + last)
    } else {
        last
    }
}

fun tailing(n: int, acc: seq<int>) {
    if (n > 0) {
        tailing(n - 1, acc.tail)
    } else {
        acc
    }
}

fun access(n: int, s: seq<int>, acc: int): <div, ndet> int {
   if (n >= 0) {
       access(n - 1, s, s.at(n).mbint + acc)
   } else { acc }
}

fun main() {
    val c = create(10000000, new())
    print-elapsed({ heading(1000000, c, 0) }, "heading 1000000: ")
    print-elapsed({ tailing(1000000, c) }, "tailing 1000000: ")
    print-elapsed({ access(10000000 - 1, c, 0) }, "access 10000000: ")
    c.head.mbint.show.println()
}

Scala:

import scala.collection.immutable.Vector
import scala.util.Random

object Test {
    val rand = new Random()
    def create(cnt: Int, acc: Vector[Int]): Vector[Int] = {
        if (cnt <= 0) {
            acc
        } else {
            create(cnt - 1, cnt +: acc)
        }
    }
    def heading(cnt: Int, x: Vector[Int], y: Int): Int ={
        if (cnt > 0) {
            heading(cnt - 1, x, x.head + y)
        } else {
            y
        }
    }
    def tailing(cnt: Int, x: Vector[Int]): Vector[Int] = { 
        if (cnt > 0) { tailing(cnt - 1, x.tail)  } else { x }
    }
    def access(cnt: Int, x: Vector[Int], acc: Int): Int = { 
        if (cnt >= 0) { access(cnt - 1, x, acc + x(rand.nextInt(x.length))) } else { acc }
    }
}

object Main {
    def run(name: String)(f: => Unit): Unit = {
       val t1 = System.nanoTime
       val res = f
       val delta = (System.nanoTime - t1) / 1e9d
       println(name + delta.toString)
       return f
    }
    def main(args: Array[String]): Unit = {
        val c = Test.create(10000000, Vector.empty)
        run ( "heading 1000000: " ) { Test.heading(1000000, c, 0) }
        run ( "tailing 1000000: " ) { System.err.println(Test.tailing(1000000, c).head) }
        run ( "access 10000000: " ) { System.err.println(Test.access(10000000 - 1, c, 0)) }
        System.err.println(c.head) // keep it, don't gc
    }
}

Haskell

import Text.Printf
import Control.Exception
import System.CPUTime
import Data.Sequence
import System.Random
import Control.DeepSeq

time :: NFData t => t -> IO ()
time a = do
    start <- getCPUTime
    end   <- a `deepseq` getCPUTime
    let diff = (fromIntegral (end - start)) / (10^12)
    printf "Computation time: %0.3f sec\n" (diff :: Double)

testData = fromList [0..10000000]

access :: Int -> Seq Int -> Int -> Int
access (-1) _ acc = acc
access x s acc = access (x - 1) s (acc + s `index` x) 

main = do
  time $ access (10000000 - 1) testData 0

daanx commented 3 years ago

Ouch -- that is no good :-( I am going to look into this to see what is the cause -- in principle there should be no such performance surprises. (Might be a mistake in Koka with the arbitrary precision integers causing allocations, or perhaps failed inlining causing allocation, or ???) .

daanx commented 3 years ago

Great to see your nice code! You are a koka expert already :-)

I have been playing with the sample and made it about 3x faster (see next post). When I look at the code it looks actually pretty good without doing any allocation in access. That makes me suspicious that perhaps you are not benchmarking correctly against Haskell/Scala? It could also be that here the reference counting is not doing well since Koka does not do "borrowing" inference yet which means there are lots of redundant ref counts (maybe not though, in the benchmarks like rbtree-ck this is also the case but koka still does great there -- we would need to profile it).

(Here are some things to consider: the Scala bench does random access instead of sequential; the Scala one uses a different algorithm .. as you remarked that can be a huge difference in these sort of benchmarks; the Haskell one may be a different algorithm as well? You need to check if it is exactly the same; The amount of work is not the same: the Haskell one also constructs the tree in the time it accesses it so that is unfair to Haskell; but also it may construct it lazily while accessing and thus allocate a lot less; you need to construct first, deepseq, and then do the access loop.)

daanx commented 3 years ago

With regard to the Koka code; I tried:

koka is not yet a good optimizing compiler (except for refcounting optimization). For example, it does not do CSE (yet) and thus expressions like x.size are re-evaluated on every occurrence. I manually val bind those when these are shared.
there was allocation going on for the maybe<(int,node<a>)> types: the tuples are value types and need to become boxed to fit in the maybe type (see docs). So I created a new type for such return values:
```
type element<a> {
con None
con Some{ cnt : int; elem : a }
}
```
which is now a full value type without needing allocation (and passed and returned in registers). (this saves 200MiB allocations)

With that I created a fresh at implementation and call the new test accessx:

heading 1000000:  0.045s
tailing 1000000:  0.063s
access 10000000:  12.017s
50000004999999
accessx 10000000:  4.181s
50000004999999

(I also created a version using just 32-bit integers to see if this is the culprit and that dropped from 4.18s to 3.3s so there is future work to do there :-) )

Since the code is now allocation free, I am not sure how to get this faster -- some "size" calls on the digit will match the structures twice and perhaps Scala/Haskell get rid of that? There is also a lot size recalculation on the nodes. This is why I am suspicious now that there are algorithmic differences as well but not sure. TBC :-)

The "optimized" code is:

type element<a> {
  con None
  con Some{ cnt : int; elem : a }
}

fun lookupd(digit: ndigit<a>, index: int): element<node<a>> {
  match (digit) {
    One(x) -> Some(0,x)
    Two(x, y) -> {
      val xn = x.size
      if (xn > index) then Some(0,x) else Some(xn, y)
    }
    Three(x, y, z) -> {
      val xn = x.size
      if (xn > index) return Some(0, x)
      val xyn = xn + y.size
      if (xyn > index) then Some(xn, y) else Some(xyn, z)
    }
    Four(x, y, z, d) -> {
      val xn = x.size
      if (xn > index) return Some(0, x)
      val xyn = xn + y.size
      if (xyn > index) return Some(xn, y)
      val n  = xyn + z.size
      if (n > index) then Some(xyn, z) else Some(n, d)
    }
    Zero -> None
  }
}

fun lookupn(node: node<node<a>>, index: int): element<node<a>> {
  match (node) {
    Node1(_, x) -> Some(0, x)
    Node2(_, x, y) -> {
      val xn = x.size
      if ( xn > index ) then Some(0, x) else Some(xn, y)
    }
    Node3(_, x, y, z) -> {
      val xn = x.size
      if (xn > index) return Some(0, x)
      val xyn = xn + y.size
      if (xyn > index) then Some(xn, y) else Some(xyn, z)
    }
  }
}

fun add(e : element<a>, adjust : int): element<a> {
  match (e) {
    Some(cnt,x) -> Some(cnt+adjust,x)
    _  -> None
  }
}

fun lookupt(tree: ntree<a>, index: int): element<node<a>> {
  match (tree) {
    Empty      -> None
    Leaf(node) -> Some(0,node)
    Deep(_, dl, t, dr) -> {
      val ln = dl.size
      if (ln > index) return dl.lookupd(index)
      val i  = index - ln
      val tn = t.size
      if (tn > i) then {
        match (t.lookupt(i)) {
          Some(cnt, node) -> {
            val j = ln + cnt
            node.lookupn(index - j).add(j)
          }
          _ -> None
        }
      }
      else {
        val j = ln + tn 
        dr.lookupd(index - j).add(j) 
      }
    }
  } 
}

public fun atx(seq: seq<a>, index: int) : maybe<a> {
  if (index.is-neg || seq.tree.size <= index) 
    then Nothing 
    else match (seq.tree.lookupt(index)) {
      Some(_, a) -> Just(a.x)
      _          -> Nothing
    }  
}

daanx commented 3 years ago

Very nice code btw. Surprised to see such large complex structure implemented in Koka :-) It worries me too -- fingertrees seem quite complex :-)

Some things I noticed:

The idea is to not prefix module with public as that is default, and also to not use private for type/function declarations as that is default as well -- only use public for things that are public.
you can use abstract on the seq<a> type: this makes the type public but the constructors private to the module
I use 2-space indentation everywhere but I can see that can be a personal preference.
judicious use of return is nice to set up guards, like if (test()) return result. I am thinking about creating perhaps a keyword for that to avoid the use of return; not sure, perhaps like guard(!test()) else result
not sure why you marked some functions as div or ndet: atx in my code is total?

SchrodingerZhu commented 3 years ago

So after the optimization, koka's implementation seems to only have a constant penalty compared with Haskell in lookup. There is a difference between the implementation in my repo and Data.Sequence; that is Data.Sequence uses adding indices in each level to locate while I used subtracting indices in each level. Not quite sure whether this will result in more calculation of the size (thou it should be in O(1)). At least, they are at almost the same level (I have tested other functions like cons/snoc before and it was a draw).

Refined and added two more tests with scala:

fun merge(n: int, m: seq<a>, acc: seq<a>): div int {
   if (n >= 0) {
       merge(n - 1, m, acc.merge(m))
   } else { acc.size }
}

fun split(acc: seq<a>) : div int {
    if (acc.size >= 100) {
        split(acc.split(100).snd)
    } else {
        acc.size
    }
}

Scala scala -J-Xmx10g

    def merge(cnt: Int, x: Vector[Int], acc: Vector[Int]): Int = { 
        if (cnt >= 0) { merge(cnt - 1, x, acc ++ x) } else { acc.length }
    }
    def split(acc: Vector[Int]) : Int = {
        if (acc.length >= 100) {
            split(acc.splitAt(100)._2)
        } else { acc.length }
    }

Results

- Scala
  heading 1000000: 0.014681623
  tailing 1000000: 0.067306418
  access 10000000: 0.830828113
  split 10000000/100: 0.067702266
  # (GC was called explicitly brefore the next part)
  merge 10000*10000: 0.705405871 

- Koka
  heading 1000000:  0.068s
  tailing 1000000:  0.09s
  access 10000000:  8.918s
  split 10000000/100:  0.127s 
  # notice that split is also refined 
  # to cache the size without calculating them for multiple times
  merge 10000*10000:  0.011s

Koka seems to win a lot in merge since it just need to do in-place update here (maybe it is unfair for scala xD). Given that the compiler optimization facilities are not complete, this result is okay for the current stage?

koka-lang / koka

koka data structures #102

Move to #104