Performance of join-over

So far I'm very disappointed about the performance. But probably I'm just writing inefficient code.

With haskell: main= writeFile "haskelloutput" $ foldr (++) "" [show x ++ "\n" ++ show y ++ "\n"| x <- [0..99], y <-[0..999]]

time runhaskell haskelltest.hs real 0m0.741s

with kona: \t `konaoutput 0: ,/$!100 1000 forever....

Probably what's happening in this case is that Kona is reallocating character vectors at each step of the join-over. What it should do instead, and this is fairly easy to perform, is to identify that the intermediate vectors are going to be reclaimed and reuse them up to the length of allocation. This would yield an O(n) expansion method. If it's doing what I think it's doing, the current method is O(n^2). In that case I would agree with your disappointment about the performance.

See https://github.com/kevinlawler/kona/blob/master/src/vg.c#L341

Identifying vectors that will be reclaimed is possibly as easy as checking for reference count 1.

Another, probably better, way to do this is to build an optimized join-over verb for join and include it in the dispatch table.

I added some debuging output to join, and the reference count is sometimes 2 and sometimes 3. I haven't figured out how the dispach table works, i only found the struct that hold one row, but no element. Where are these added?

The main item is:

TR DT[] = //Dispatch table is append-only. ...

Slightly edited:

$ ack DT  
src/k.c
76:I bk(V p){R (L)p==DT_END_OFFSET;} //break: is ; or \n
102:Z C verbsChar(V p)  {R ((L)p>=DT_VERB_OFFSET && (L)p < DT_SPECIAL_VERB_OFFSET)?vc[((L)p-DT_VERB_OFFSET)/2]:'\0';}
104:Z C adverbsChar(V p){R ((L)p>=DT_ADVERB_OFFSET)?ac[((L)p-DT_ADVERB_OFFSET)%3]:'\0';}
110:  if(q<DT_SIZE)R DT[q].arity;
117:  if (q<DT_SIZE) R DT[q].adverbClass;
244:           DO(a->n, CPMAX str=kS(a)[i]; if(str<(S)DT_SIZE)continue; sl=strlen(str);ss=simpleString(str);
256:        if(q < DT_SIZE && q >= DT_SPECIAL_VERB_OFFSET)
257:        {  s=DT[q].text;
260:           if(s[k-1]==':' && 1==DT[q].arity) O_("%c",':'); //extra colon for monadic 0: verbs
387:TR DT[] =  //Dispatch table is append-only. Reorder/delete/insert breaks backward compatibility with IO & inet
518:L DT_SIZE=0;
519:L DT_END_OFFSET, DT_ADVERB_OFFSET, DT_VERB_OFFSET, DT_SPECIAL_VERB_OFFSET;
520:L DT_OFFSET(V v){I i=0; while(v!=DT[i].func)i++; R i;} //init only

src/k.h
111:L DT_OFFSET(V v);
124:extern L DT_SIZE, DT_END_OFFSET, DT_ADVERB_OFFSET, DT_VERB_OFFSET, DT_SPECIAL_VERB_OFFSET;
125:extern TR DT[];

src/kc.c
90:  DT_SIZE                 = DT_OFFSET(TABLE_END);
91:  DT_END_OFFSET           = DT_OFFSET(end);
92:  DT_ADVERB_OFFSET        = DT_OFFSET(over);
93:  DT_VERB_OFFSET          = DT_OFFSET(flip);
94:  DT_SPECIAL_VERB_OFFSET  = DT_OFFSET(_0m);

Above the dispatch table are good templates such as K times_over(K x,K y)

on a maybe related topic - i've noticed different behaviour between kona / kdb3.2 and k2.8 on join each. as the output of kona and kdb mostly agree (additional round bracket), my guess is that this is actually an issue with k2.8:

/kona K Console - Enter \ for help ,'$!3 (,[,"0";];,[,"1";];,[,"2";])

Other strings to try:

,:'$!3

,/:$!3

,:/:$!3

summarizing all tests in a table including tom's results below

expr	kona	kdb3.2-l32	k2.8-l32/k3.2-w32
,:'$!3	(,,"0" ,,"1" ,,"2")	(,,"0";,,"1";,,"2")	(,,"0" ,,"1" ,,"2")
,/:$!3	(,[,"0";];,[,"1";];,[,"2";])	,/:[(,"0";,"1";,"2")]	valence error ,/:$!3 ^
,:/:$!3	(,,"0" ,,"1" ,,"2")	k){z+xy} 'type 0 ,: )	valence error ,:/:$!3 ^
,'$!3	(,[,"0";];,[,"1";];,[,"2";])	,'[(,"0";,"1";,"2")]	valence error ,'$!3 ^

tavmem commented 9 years ago

Just to complete the picture: k3.2-Windows-version (as opposed to kdb3.2) yields the same results as k2.8.

tavmem commented 9 years ago

What should kona yield? Are the current kona results appropriate?

kevinlawler commented 9 years ago

They are sane. Appropriate, depends. Kona projects, as early K did, in instances where K2 throws a valence error. Possibly this should be toggle-able or off by default---up to you. I don't know what the current thinking is.

kdb3.2 projects at the adverb level which is...just another way of doing things. I would argue less natural and possibly a quirk/side-effect.

bakul commented 9 years ago

What kona does is partial evaluation. When a dyadic verb is given one object; you get a monadic verb and you can apply it again. k3 does the same in the normal case. Thus for instance

  ,["ab"]
,["ab"]
  ,["ab"] "cd"
"abcd"
  (,["ab"])'("cd";"ef")
("abcd"
 "abef")

But k3 reasons that returning an array of monadic verbs is not very useful and it is more likely to be an error so it gives a valence error. k4 is consistent and much like kona in this instance. I think k3 behavior is more helpful while k4/kona behavior is not useful and a bit cryptic. If there was an "each-verb" operator that applied a set of verbs to the same noun, then the more consistent kona/k4 behavior would make sense!

tavmem commented 9 years ago

"Odometer" does not exist in k2.8

  !2 3
type error
!2 3
^
>

What is an "efficient" way of getting the "odometer" result in k2.8?

tavmem commented 9 years ago

This works, but it is not particularly elegant:

  {+(2,x*y)#(,/y#'!x),((x*y)#!y)}[2;3]
(0 0
 0 1
 0 2
 1 0
 1 1
 1 2)

tavmem commented 9 years ago

Not elegant, but it seems to run faster than the Haskell example (k2.8 on 32-bit Linux):

  f:{+(2,x*y)#(,/y#'!x),((x*y)#!y)}
  \t `konaoutput 0: ,/$f[100;1000]
117

Not clear what the Haskell example was run on.

tavmem commented 9 years ago

Dropping the "write-to-file" we get

  f:{+(2,x*y)#(,/y#'!x),(x*y)#!y}
  \t ,/$f[100;1000]
75

Note that the A+ implementation is even faster (on the same 32-bit Linux machine)

     $mode ascii
     g{x;y}:{flip(2,x*y)rho((x rho y)/iota x),(x*y)rho iota y}
     g{2;3}
 0 0
 0 1
 0 2
 1 0
 1 1
 1 2
     time g{100;1000}
 0 0 10

Where time yields a 3-element vector of system, user and elapsed time (in microseconds).

tavmem commented 9 years ago

Oops ... that was not comparable. I left out the formatting in A+ k2.8 is faster. (k2.8 is 75ms, A+ is 120 ms)

     $mode ascii
     g{x;y}:{flip(2,x*y)rho((x rho y)/iota x),(x*y)rho iota y}
     time form g{100;1000}
 120 10 120

bakul commented 9 years ago

On Mon, 29 Jun 2015 07:19:05 PDT Tom Szczesny notifications@github.com wrote:

"Odometer" does not exist in k2.8
  !2 3
type error
!2 3
^
> 
What is an "efficient" way of getting the "odometer" result in k2.8?

For two args:

,/(!x),/:\:!y

But these are not equivalent. kona's ! can take a vector X and returns Nx2 array where N = */X. Though I would've have expected !X to be an array of shape X[0] x X[1] ..., not N x 2.

tavmem commented 9 years ago

But then again, if you leave out the formatting in both languages, ~~A+ wins (by a factor of 7)~~, they are comparable. k2.8

  f
{+(2,x*y)#(,/y#'!x),((x*y)#!y)}
  f[2;3]
(0 0
 0 1
 0 2
 1 0
 1 1
 1 2)
  \t f[100;1000]
11

A+

     $mode ascii
     g
g{x;y}:{flip(2,x*y)rho((x rho y)/iota x),(x*y)rho iota y}
     g{2;3}
 0 0
 0 1
 0 2
 1 0
 1 1
 1 2
     time g{100;1000}
 10 0 10

tavmem commented 9 years ago

@bakul wrote:

For two args: ,/(!x),/::!y

Thanks!

tavmem commented 9 years ago

~~Much more efficient (and elegant)~~ More elegant, but not more efficient. k2.8 on 32-bit Linux:

  h:{,/(!x),/:\:!y}
  h[2;3]
(0 0
 0 1
 0 2
 1 0
 1 1
 1 2)
  \t h[100;1000]
19

tavmem commented 9 years ago

Apologies. I mixed up my k2.8 and my Kona sessions in 2 of the above results. I am editing 2 of the above entries to correct them.

tavmem commented 9 years ago

After fix: in Kona:

  \t `konaoutput 0: ,/$!100 1000
280

in k2.8 (on same machine):

   h:{,/(!x),/:\:!y}
  \t `konaoutput 0: ,/$h[100;1000]
128

This addresses optimization of join-over in a narrow sense. It speeds up processing when all the items to be joined are lists.

ToDo: handle other cases.

kevinlawler / kona

Performance of join-over #307