Closed kevinlawler closed 10 years ago
"Any check should be efficient." This is a deeply thorny issue, especially if we're just using reference counting. It could require an arbitrarily-deep list/dict walk at amend / item-amend time. We may be better off allowing circular references and falling back on a generational mark/sweep garbage collector for lists and dicts that are item-amended to other lists/dicts (the only types that can be transitively self-referential) - other solutions are likely to be equally expensive.
I have no idea how Kx k implements it, but I haven't been able to find satisfactory solutions in any GC/ref-counting research papers I've come across. (I can send a huge bundle of them your way if you want; the Jones/Lins GC book is also highly recommended.) Maybe I'm missing something; list/dict -> self is simple, but the transitive case is much harder.
Printing objects with recursive references could be handled by consulting some sort of already-seen lookup table before printing lists or dicts; that's straightforward.
It may also be worth lazily freeing (http://www.iecc.com/gclist/GC-algorithms.html) those types in cd, rather than doing the entire structure on demand. A very high-dimension matrix stored in a list-of-list-of... can lead to an arbitrarily large freeing pause in the current implementation. (Look for 'Weizenbauming'.)
Thoughts?
When I released Kona I considered circular reference checking to be a glaring omission. My belief now is that the checks are unnecessary. Creating a circular reference (c: .`.k
) should be similar to creating an infinite loop ({1}f/x
). If you really want to, you can, with undesirable results.
For circular reference detection there is little overhead involved in verifying the average-case assignment. However, in the worst-case there can be significant overhead. Because K is functional we can restrict our attention to emendations to the K Tree. A newly added or amended variable on the K Tree must perform a deep check on its assignment for references to any of its parents in the K Tree. So .k.a.b.c: d
must scan all objects in d
at any depth to determine whether they are one of .
. This is reducible to non-empty set intersection of pointers. , .
k, .k.a, or .
k.a.b
The cases that cause the need for this check seem pathological. I have not unintentionally created a circular reference since the release of Kona, and I have not seen any complaints about them appearing with regard to everyday code. You seemingly have to go out of your way to create a cycle.
I imagine the circular reference checking in Kx's version of K was added at the request of or presaged a request of the investment banks. Like unnecessary valence errors, it makes the language safer at a cost to power. For the time being we are getting away with having a purer K, and I think we also can get away with omitting reference checking.
We haven't done much with the K Tree, and we haven't implemented much database technology. If the lack of reference checking turns into a problem later on we can revisit the issue.
Sure. It may also be worth re-visiting it if attempts to manage the effects of circular references end up more expensive (in time, space, responsiveness, or complexity) than just supplementing the ref-counting with a decent mark & sweep GC.
circular reference checking needs to be implemented after ci and cd are updated to act recursively
The precursor language to K was A+, and we have the complete source code. If there is some way to check whether A+ does circular reference checking, then we could try to see how it was done there. We also have the complete source code for J. Same strategy could apply.
There is also a J-users group, and Roger Hui could be contacted, to give us some pointers on where to look in J.
I would be really, really careful about licensing. A+ is GPL'd. Just running it with circular references should be simple enough.
I've got tons of resources about garbage collector / reference counting implementation. The current RC strategy is all over the codebase, though - making even subtle changes to it could be a lot of work.
I am reasonably certain that GC is not needed. Any sharing due to COW shouldn't create a circular reference. Only "leakage" is in the global symtab but that can't be helped. Lisps have the same issue (the global symtab is in the "root" set if you are doing GC -- only the user knows which syms will be referenced).
This reduces to a set of manageable cases.
at_ref
at the very root cases where you are either updating using an integer index (at updateIndex
) or a symbol index (below that with the raw dictionary assignment). It may also happen at the very top of dot_ref
(at -1==s
, which I think is handling ".[`a;();:;1]" style empty-index-list assignment, i.e. "a:1"). I believe that is everywhere such an assignment may occur.Putting such a cycle check in K dot_tetradic_2
before the return may be sufficient. This method was where the copy-on-write code was added.
I would appreciate some case examples that currently fail.
k2.8 and Kona are quite close in results for c:.`.k Which is preferred? Or do we want an error message instead?
tom@xps ~/k $ rlwrap ./k
K 2.8 2000-10-10 Copyright (C) 1993-2000 Kx Systems
Evaluation. Not for commercial use.
\ for help. \ to exit.
. .((
k;;)
(t;-661716687;)) c: .
.k
. .((
k
.,(c .,(
c;;) <==========
)
)
(`t;-661716687;))
tom@xps ~/kona $ rlwrap ./k
K Console - Enter \ for help
. .((
k;;)
(t;-6.617167e+08;)) c: .
.k
()
. .((
k
.,(c;();) <========== ) (
t;-6.617167e+08;))
You get a segfault in Kona if you do something AFTER the c:..k tom@xps ~/konaK $ rlwrap ./k K Console - Enter \ for help c: .
.k
()
a:3
SIGSEGV
Great progress!
I found a few problems:
1. In k:
d:.k d .,(`d;;)
In kona: d:.k; d ()
2. A second d:.k kills kona with SIGSEGV.
3.
d:.+(a
b;(3 3#!9; 10))
d.a[2]:(d;d;d)
Causes SIGSEGV
On Jan 11, 2014, at 9:58 PM, Tom Szczesny notifications@github.com wrote:
You get a segfault in Kona if you do something AFTER the c:.
.k tom@xps ~/konaK $ rlwrap ./k K Console - Enter \ for help c: .
.k () a:3 SIGSEGV— Reply to this email directly or view it on GitHub.
Works:
a:11
.g:.k
.g.a:10
Works:
.g.k:`.
Which strikes me as odd given Bakul's test case.
This gave me a segfault after the previous testing (not sure why):
.k.x.y.a: 11
This worked though:
.k.x.y.a: 11
.k.x.y.a: .k.x
As did:
.k.x.y.a.y.a : .k.x.y.a
x.y.a.y.a : .k.x.y.a
x.y.a.y.a : x.y.a
Seems to me that all test cases listed above now work (and no segfaults on repeats). Please confirm.
Any other cases that fail?
On Tue, 14 Jan 2014 05:34:34 PST Tom Szczesny notifications@github.com wrote:
Seems to me that all test cases listed above now work. Please confirm.
Any other cases that fail?
Try
d:.k;d:.k;d
This should show
.,(d .,(
d;;)
)
Yup ... definitely a problem. Thanks.
Applied fix for case: d:.k;d:.k;d
I would appreciate notification of any other cases that fail.
On Wed, 15 Jan 2014 11:45:10 PST Tom Szczesny notifications@github.com wrote:
Applied fix for case: d:.k;d:.k;d
I would appreciate notification of any other cases that fail.
Related:
1) a:b:.k
Both a and b should be set to
.((b;;) (
a;;))
2)
c:.+(a
b;(.k;.k))
c should be
.((a .,(
c;;)
)
(b .,(
c;;)
))
Thanks!
On the lighter side: I keyed "git git commit" instead of "git commit", and got this inexplicable and surprising result:
tom@xps ~/kona $ git status
#
#
no changes added to commit (use "git add" and/or "git commit -a")
tom@xps ~/kona $ big diff tests.c
bash: big: command not found
tom@xps ~/kona $ git diff tests.c
tom@xps ~/kona $ git add kx.c
tom@xps ~/kona $ git add tests.c
tom@xps ~/kona $ git git commit -m "fix c:.+(a
b;(.k;.k));c"
A+
Copyright (c) 1990-2008 Morgan Stanley. All rights reserved.
This version is Release 4.22
Fixed the last 2 cases.
Please let me know if you find any additional case failures.
A+
How did that happen? Jump to your A repo somehow?
No, it did not jump to the A repo on github.com, ... and the "git git" is a red herring. The quoted string is being interpreted by bash on my computer as a call to A+. Normally, the call is simply "a" (see below). Note that the `a is missing in the bash error message.
tom@xps ~/kona $ "fix c:.+(a
b;(.k;.k));c"
A+
Copyright (c) 1990-2008 Morgan Stanley. All rights reserved.
This version is Release 4.22
$off
bash: fix c:.+( b;(.k;.k));c: command not found
tom@xps ~/kona $ a
A+
Copyright (c) 1990-2008 Morgan Stanley. All rights reserved.
This version is Release 4.22
The `a also got removed by bash from the title of the last commit.
haha
When giving bash a quoted string, it seems that you need to escape any backtics in order for the string to be interpreted correctly.
On Thu, 16 Jan 2014 07:05:03 PST Tom Szczesny notifications@github.com wrote:
Fixed the last 2 cases.
Please let me know if you find any additional case failures.
a:b:c:(.k;.k;.k)
fails. Each of a, b & c should be
(.((c;;) (
b;;)
(a;;)) .((
c;;)
(b;;) (
a;;))
.((c;;) (
b;;)
(`a;;)))
Also note the order. Not sure it is important.
One other thing that bothers me (but may not be related). Consider:
a:b:1 a:b:.k a.a 1 a.b 1 a.c
Now a.c is not defined and I would expect to see an error instead it gets silently added:
a
.((a;1;) (
b;1;)
(`c;;))
Thanks again !
The 2nd point (about c getting silently added) is normal Kona behavior. This is where Kona is significantly different from k, from Q, from J, from A+, and from APL. See below:
tom@xps ~/kona $ rlwrap ./k
K Console - Enter \ for help
c
. .((
k
.,(c;;) ) (
t;-6.613177e+08;))
On Thu, 16 Jan 2014 12:22:52 PST Tom Szczesny notifications@github.com wrote:
The 2nd point (about c getting silently added) is normal Kona behavior. This is where Kona is significantly different from k, from Q, from J, from A+, and from APL.
I understand but this means errors due to misspellings will be harder to catch.
But the stronger reason for me is the following: Conceptually a dictionary and a vector are both /maps/; a dictionary maps symbols to values, a vector maps integer indices to values. Just as v[i] gives an index error if i is not in !(#v), d[s] should give an error if s is not in !d. Or automatically extend a vector when an index value is out of range!
Issue 226 proposes a compiler flag (or flags) that would bring Kona into closer conformance with k3.2. In my mind, this item is a leading candidate.
Fixed last case: a:b:c:(.k;.k;.k)
However, the order of the items in the result is reversed vis-a-vis k3.2 (I'm not the best person to judge whether this is important, or not. I'll leave that to you, to Kevin and the "k-gods".) Let me know.
On Thu, 16 Jan 2014 17:34:54 PST Tom Szczesny notifications@github.com wrote:
Fixed last case: a:b:c:(.k;.k;.k)
You seem to have licked it this time! Now I will test for any memory leaks. Or write a program that writes random k scripts!
Something I noticed which may be a bug or a feature.
a:.+(x
y;1 2)
.k:a
This assignment is not allowed on k-3.2 but kona allows this and not only that, now a is erased, replaced by x & y!
Although this assignment is not allowed on k3.2, it does change the K-Tree:
tom@xps ~/k $ rlwrap ./k K 2.8 2000-10-10 Copyright (C) 1993-2000 Kx Systems Evaluation. Not for commercial use. \ for help. \ to exit.
a:.+(x
y;1 2)
. .((
k
.,(a .((
x;1;)
(y;2;)) ) ) (
t;-661296283;))
.k:a
.k.a
reference error
.k:a
^
\ .
.((
k .,(a .((\
x;.();.()) (y;.();.())) .()) ) (
t;-661296283;))
I don't think that we have this issue licked yet. In the following, the 1st case "works" (if you don't count the reversed order), the 2nd fails:
tom@xps ~/kona $ rlwrap ./k
K Console - Enter \ for help
x:y:z:(.k;.k;.k)
(.((x;;) (
y;;)
(z;;)) .((
x;;)
(y;;) (
z;;))
.((x;;) (
y;;)
(`z;;)))
\
tom@xps ~/kona $ rlwrap ./k K Console - Enter \ for help x:y:(.k;.k) ()
Case x:y:(.k;.k) is fixed.
There are no checks in place to stop circular references in K objects. Circular references are created when objects in the K tree contain a reference to themselves or to one of their ancestors. An ancestor is any object on the path from the root to the object.
Attempting to print objects with circular references results in an infinite loop. Printing should also be changed to return after a given depth.
One way to create a circular reference is to do something like
c: .`.k while you are in the .k directory. This creates an entry inside .k that points back to .k.
Attempts to create a circular reference should result in a ref error. Any checks should be efficient. There are some comment in the code pointing out where modifications might be needed.
K3 had a couple global system variables that could have been involved in this process. The system variable _v held the symbol name of the variable being amended, whether by dot or by at. The variable _i held the indices where _v was being modified. Neither has been implemented. It isn't clear whether these are necessary to implement the check.