Open SchrodingerZhu opened 3 years ago
Knowing generally if a recursive function is divergent is impossible. That being said for cases where we can prove that it's not convergent we can only get full type safety for all such functions in a language with a proof system (i.e. idris). As for whether this function can be rewritten to not be divergent, I'm not 100% sure, but my guess would be no.
Interesting -- make sure to share your results :-)
The compiler can only prove a direct recursive function can terminate if you match explicitly on an inductive datatype; I think it does not work here because you use functions like uncons
and unsnoc
so the compiler does not "see" that size(m') < size(m)
.
Koka is on purpose not "too smart" about proving non-divergence as we want it to be predictable (like type checking).
(However, you can use unsafe-decreasing(m')
to assert that m'
is smaller in the recursive calls... but better be sure it is correct :-))
type digit<a> {
con Zero
con One (a: a)
con Two (a: a, b: a)
con Three (a: a, b: a, c: a)
con Four (a: a, b: a, c: a, d: a)
}
type split<c, a> {
con NotFound
con Split(front: c<a>, mid: a, rear: c<a>)
}
fun split(digit: digit<a>, index: i): split<digit, a> {
match ((digit, index)) {
(One(x), 0) -> Split(Zero, x, Zero)
(Two(x, y), 0) -> Split(Zero, x, One(y))
(Two(x, y), 1) -> Split(One(x), y, Zero)
(Three(x, y, z), 0) -> Split(Zero, x, Two(y, z))
(Three(x, y, z), 1) -> Split(One(x), y, One(z))
(Three(x, y, z), 2) -> Split(Two(x, y), z, Zero)
(Four(a, b, c, d), 0) -> Split(Zero, a, Three(b, c, d))
(Four(a, b, c, d), 1) -> Split(One(a), b, Two(c, d))
(Four(a, b, c, d), 2) -> Split(Two(a, b), c, One(d))
(Four(a, b, c, d), 3) -> Split(Three(a, b, c), d, Zero)
_ -> NotFound
}
}
Yields me
(version 2.0.12, Dec 2 2020, libc 64-bit (gcc))
failure while running c:
*** internal compiler error: Backend.C.ParcReuse.patAddNames
Fixed in #104 :-)
@daanx Hi, I have finished the fingertree experiment; but the results aren't satisfactory: Koka:
heading 1000000: 0.076s
tailing 1000000: 0.085s
access 10000000: 19.981s
Scala:
heading 1000000: 0.006597249
tailing 1000000: 0.051602835
access 10000000: 0.871750356
Haskell
Computation time: 2.281 sec
Scala uses a radix balanced fingertree (a very special data structure, may take a while to implement: https://github.com/scala/scala/pull/8534); therefore it is expected to be fast; In Haskell, thou I didn't apply random access, the time difference is already significant enough.
This result is little bit wield to me: yes, the tree is implemented in strict environment, but look-up operation should even be faster it the tree is strict evaluated. Perhaps I will need some external helps to figure out what is going on here.
The code follows:
import collections/fingertree
import std/num/random
import std/time/timer
fun create(n: int, k: seq<int>): <div, ndet> seq<int> {
if (n <= 0)
then k
else create(n - 1, k.cons(n))
}
fun heading(n: int, k: seq<int>, last: int) {
if (n > 0) {
heading(n - 1, k, k.head.mbint + last)
} else {
last
}
}
fun tailing(n: int, acc: seq<int>) {
if (n > 0) {
tailing(n - 1, acc.tail)
} else {
acc
}
}
fun access(n: int, s: seq<int>, acc: int): <div, ndet> int {
if (n >= 0) {
access(n - 1, s, s.at(n).mbint + acc)
} else { acc }
}
fun main() {
val c = create(10000000, new())
print-elapsed({ heading(1000000, c, 0) }, "heading 1000000: ")
print-elapsed({ tailing(1000000, c) }, "tailing 1000000: ")
print-elapsed({ access(10000000 - 1, c, 0) }, "access 10000000: ")
c.head.mbint.show.println()
}
Scala:
import scala.collection.immutable.Vector
import scala.util.Random
object Test {
val rand = new Random()
def create(cnt: Int, acc: Vector[Int]): Vector[Int] = {
if (cnt <= 0) {
acc
} else {
create(cnt - 1, cnt +: acc)
}
}
def heading(cnt: Int, x: Vector[Int], y: Int): Int ={
if (cnt > 0) {
heading(cnt - 1, x, x.head + y)
} else {
y
}
}
def tailing(cnt: Int, x: Vector[Int]): Vector[Int] = {
if (cnt > 0) { tailing(cnt - 1, x.tail) } else { x }
}
def access(cnt: Int, x: Vector[Int], acc: Int): Int = {
if (cnt >= 0) { access(cnt - 1, x, acc + x(rand.nextInt(x.length))) } else { acc }
}
}
object Main {
def run(name: String)(f: => Unit): Unit = {
val t1 = System.nanoTime
val res = f
val delta = (System.nanoTime - t1) / 1e9d
println(name + delta.toString)
return f
}
def main(args: Array[String]): Unit = {
val c = Test.create(10000000, Vector.empty)
run ( "heading 1000000: " ) { Test.heading(1000000, c, 0) }
run ( "tailing 1000000: " ) { System.err.println(Test.tailing(1000000, c).head) }
run ( "access 10000000: " ) { System.err.println(Test.access(10000000 - 1, c, 0)) }
System.err.println(c.head) // keep it, don't gc
}
}
Haskell
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.Sequence
import System.Random
import Control.DeepSeq
time :: NFData t => t -> IO ()
time a = do
start <- getCPUTime
end <- a `deepseq` getCPUTime
let diff = (fromIntegral (end - start)) / (10^12)
printf "Computation time: %0.3f sec\n" (diff :: Double)
testData = fromList [0..10000000]
access :: Int -> Seq Int -> Int -> Int
access (-1) _ acc = acc
access x s acc = access (x - 1) s (acc + s `index` x)
main = do
time $ access (10000000 - 1) testData 0
Ouch -- that is no good :-( I am going to look into this to see what is the cause -- in principle there should be no such performance surprises. (Might be a mistake in Koka with the arbitrary precision integers causing allocations, or perhaps failed inlining causing allocation, or ???) .
Great to see your nice code! You are a koka expert already :-)
I have been playing with the sample and made it about 3x faster (see next post). When I look at the code it looks actually pretty good without doing any allocation in access. That makes me suspicious that perhaps you are not benchmarking correctly against Haskell/Scala? It could also be that here the reference counting is not doing well since Koka does not do "borrowing" inference yet which means there are lots of redundant ref counts (maybe not though, in the benchmarks like rbtree-ck this is also the case but koka still does great there -- we would need to profile it).
(Here are some things to consider: the Scala bench does random access instead of sequential; the Scala one uses a different algorithm .. as you remarked that can be a huge difference in these sort of benchmarks; the Haskell one may be a different algorithm as well? You need to check if it is exactly the same; The amount of work is not the same: the Haskell one also constructs the tree in the time it accesses it so that is unfair to Haskell; but also it may construct it lazily while accessing and thus allocate a lot less; you need to construct first, deepseq, and then do the access loop.)
With regard to the Koka code; I tried:
x.size
are re-evaluated on every occurrence. I manually val
bind those when these are shared.maybe<(int,node<a>)>
types: the tuples are value types and need to become boxed to fit in the maybe type (see docs). So I created a new type for such return values:
type element<a> {
con None
con Some{ cnt : int; elem : a }
}
which is now a full value type without needing allocation (and passed and returned in registers). (this saves 200MiB allocations)
With that I created a fresh at
implementation and call the new test accessx
:
heading 1000000: 0.045s
tailing 1000000: 0.063s
access 10000000: 12.017s
50000004999999
accessx 10000000: 4.181s
50000004999999
(I also created a version using just 32-bit integers to see if this is the culprit and that dropped from 4.18s
to 3.3s
so there is future work to do there :-) )
Since the code is now allocation free, I am not sure how to get this faster -- some "size" calls on the digit will match the structures twice and perhaps Scala/Haskell get rid of that? There is also a lot size recalculation on the nodes. This is why I am suspicious now that there are algorithmic differences as well but not sure. TBC :-)
The "optimized" code is:
type element<a> {
con None
con Some{ cnt : int; elem : a }
}
fun lookupd(digit: ndigit<a>, index: int): element<node<a>> {
match (digit) {
One(x) -> Some(0,x)
Two(x, y) -> {
val xn = x.size
if (xn > index) then Some(0,x) else Some(xn, y)
}
Three(x, y, z) -> {
val xn = x.size
if (xn > index) return Some(0, x)
val xyn = xn + y.size
if (xyn > index) then Some(xn, y) else Some(xyn, z)
}
Four(x, y, z, d) -> {
val xn = x.size
if (xn > index) return Some(0, x)
val xyn = xn + y.size
if (xyn > index) return Some(xn, y)
val n = xyn + z.size
if (n > index) then Some(xyn, z) else Some(n, d)
}
Zero -> None
}
}
fun lookupn(node: node<node<a>>, index: int): element<node<a>> {
match (node) {
Node1(_, x) -> Some(0, x)
Node2(_, x, y) -> {
val xn = x.size
if ( xn > index ) then Some(0, x) else Some(xn, y)
}
Node3(_, x, y, z) -> {
val xn = x.size
if (xn > index) return Some(0, x)
val xyn = xn + y.size
if (xyn > index) then Some(xn, y) else Some(xyn, z)
}
}
}
fun add(e : element<a>, adjust : int): element<a> {
match (e) {
Some(cnt,x) -> Some(cnt+adjust,x)
_ -> None
}
}
fun lookupt(tree: ntree<a>, index: int): element<node<a>> {
match (tree) {
Empty -> None
Leaf(node) -> Some(0,node)
Deep(_, dl, t, dr) -> {
val ln = dl.size
if (ln > index) return dl.lookupd(index)
val i = index - ln
val tn = t.size
if (tn > i) then {
match (t.lookupt(i)) {
Some(cnt, node) -> {
val j = ln + cnt
node.lookupn(index - j).add(j)
}
_ -> None
}
}
else {
val j = ln + tn
dr.lookupd(index - j).add(j)
}
}
}
}
public fun atx(seq: seq<a>, index: int) : maybe<a> {
if (index.is-neg || seq.tree.size <= index)
then Nothing
else match (seq.tree.lookupt(index)) {
Some(_, a) -> Just(a.x)
_ -> Nothing
}
}
Very nice code btw. Surprised to see such large complex structure implemented in Koka :-) It worries me too -- fingertrees seem quite complex :-)
Some things I noticed:
public
as that is default, and also to not use private
for type/function declarations as that is default as well -- only use public
for things that are public.abstract
on the seq<a>
type: this makes the type public but the constructors private to the modulereturn
is nice to set up guards, like if (test()) return result
. I am thinking about creating perhaps a keyword for that to avoid the use of return
; not sure, perhaps like guard(!test()) else result
div
or ndet
: atx
in my code is total?So after the optimization, koka's implementation seems to only have a constant penalty compared with Haskell in lookup.
There is a difference between the implementation in my repo and Data.Sequence
; that is Data.Sequence
uses adding indices in each level
to locate while I used subtracting indices in each level
. Not quite sure whether this will result in more calculation of the size (thou it should be in O(1)).
At least, they are at almost the same level (I have tested other functions like cons/snoc before and it was a draw).
Refined and added two more tests with scala:
fun merge(n: int, m: seq<a>, acc: seq<a>): div int {
if (n >= 0) {
merge(n - 1, m, acc.merge(m))
} else { acc.size }
}
fun split(acc: seq<a>) : div int {
if (acc.size >= 100) {
split(acc.split(100).snd)
} else {
acc.size
}
}
Scala scala -J-Xmx10g
def merge(cnt: Int, x: Vector[Int], acc: Vector[Int]): Int = {
if (cnt >= 0) { merge(cnt - 1, x, acc ++ x) } else { acc.length }
}
def split(acc: Vector[Int]) : Int = {
if (acc.length >= 100) {
split(acc.splitAt(100)._2)
} else { acc.length }
}
Results
- Scala
heading 1000000: 0.014681623
tailing 1000000: 0.067306418
access 10000000: 0.830828113
split 10000000/100: 0.067702266
# (GC was called explicitly brefore the next part)
merge 10000*10000: 0.705405871
- Koka
heading 1000000: 0.068s
tailing 1000000: 0.09s
access 10000000: 8.918s
split 10000000/100: 0.127s
# notice that split is also refined
# to cache the size without calculating them for multiple times
merge 10000*10000: 0.011s
Koka seems to win a lot in merge since it just need to do in-place update here (maybe it is unfair for scala
xD). Given that the compiler optimization facilities are not complete, this result is okay for the current stage?
I am trying to write a finger tree in koka for https://github.com/SchrodingerZhu/koka-collections, following the template of https://github.com/liuxinyu95/AlgoXY/blob/algoxy/datastruct/elementary/sequence/src/FingerTree.hs.
I noticed that some function like the above is derived to have a divergence side effect though if the data structure is properly written, such functions should never diverge. Is it possible to workaround such situations.
BTW, what should I pay especially attention to in order to utilize the FBIP optimization?