nwf / dyna

Dyna2 compiler and REPL
GNU Affero General Public License v3.0
146 stars 20 forks source link

Backpointers #29

Open timvieira opened 11 years ago

timvieira commented 11 years ago

How should we support backpointers for extracting derivations?

jeisner commented 11 years ago

We had a notion that gradient items and gradient-like items could be automatically defined. And perhaps that backpointer items could be, too.

But for present purposes, maybe a version of trace that includes only the most influential aggregand for each item? This shows the parse, though it doesn't make it available as items to other rules of the program.

On Sat, Jun 29, 2013 at 11:26 PM, Tim Vieira notifications@github.comwrote:

How should we support backpointers for extracting derivations?

— Reply to this email directly or view it on GitHubhttps://github.com/nwf/dyna/issues/29 .

timvieira commented 11 years ago

yeah, but the unfortunate thing about doing something like trace is that you can't use the results in other dyna rules.

jeisner commented 11 years ago

Yes, that's what I said at the end of my reply: it's not a proper solution but may be good enough for the LSA class. Do you really want to solve this one properly now? Among proper solutions, one option is to use the argmax form of max= that we discussed before. Another is to automatically define backpointer items.

jeisner commented 11 years ago

We probably ought to get this stuff in place in time for Thursday's lecture and assignment. Thoughts? I am kind of inclined to write the backpointer-following rules by hand for the moment, to be explicit. But this is a bit clunky if we need to do it by side conditions, asking whether a particular aggregand equals the result of max= (and we also have to break ties in that case, which means adding ?= if we insist on avoiding runtime $errors, right?). So it might be really convenient to have that syntactic sugar we talked about, [bestarg,bestval] max= [X,f(X)].
For Dijkstra's, that would be something like [bestpath(V), bestweight(V)] min= [[V|bestpath(U)], bestweight(U)+edge(U,V)]. Now that I look at it, maybe too confusing ... But could also write as [predecessor(V), bestweight(V)] min= [U, bestweight(U)+edge(U,V)]. bestpath(V) = [V | bestpath(predecessor(V))]. It's not too hard to handle the base case.

nwf commented 11 years ago

That syntactic sugar poses all kinds of problems for our current ANF.

jeisner commented 11 years ago

What do you recommend for the time being? Maybe just add ?= (break ties arbitrarily, so it is allowed to write foo ?= bar(X) where := would usually fail) and call it a day?

timvieira commented 11 years ago

Here is my proposal for following backpointers. It uses a definition of argmin= which compares a pair (value, key) which means ties are broken based on the second argument (this is nice predictable behavior).

> winner argmin= [1,"winner"].
> winner argmin= [2,"loser"].

A runtime error is thrown if you try to aggregate anything other than a pair of values.

> should_error argmin= [1,3,4].

Here's how to extract the optimal path in single source shortest path:

% standard single source shortest path
path(start) min= 0.
path(B) min= path(A) + edge(A,B).
goal min= path(end).

% expensive path
edge("a","b") := 1.
edge("b","c") := 1.
edge("c","d") := 1.

% cheap path
edge("a","e") := 1.
edge("e","d") := 1.

% define begining and end.
start := "a".
end := "d".

% use argmin to determine backpointers
b(V) argmin= [path(U) + edge(U,V), U].

% extract cheapest path by following backpointers from each vertex.
bestpath(start) := [start].
bestpath(V) := U is b(V), [V | bestpath(U)].

% the optimal path is the one from the `end`.
optimalpath = reverse(bestpath(end)).

% ---------------------------------------
% list utilities
:- backchain reverse/1.
reverse([]) = [].
reverse([X|Xs]) = append(reverse(Xs), X).

:- backchain append/2.
append([], Y) = [Y].
append([X|Xs], Y) = [X|append(Xs, Y)].
timvieira commented 11 years ago

We probably ought to get this stuff in place in time for Thursday's lecture and assignment. Thoughts?

I have a proposal for argmin=/argmax=. This should be sufficient for teaching. If you need me to extend trace to do "most influential aggregand" let me know. As of now, it's not high priority.

I am kind of inclined to write the backpointer-following rules by hand for the moment, to be explicit. But this is a bit clunky if we need to do it by side conditions, asking whether a particular aggregand equals the result of max= (and we also have to break ties in that case, which means adding ?= if we insist on avoiding runtime $errors, right?).

Should be easier now.

So it might be really convenient to have that syntactic sugar we talked about, [bestarg,bestval] max= [X,f(X)].

This expression is confusing. I would expect it to desugar as:

bestarg max= _ is f(X), X.
bestval max= f(X).

which is not what was intended. You probably want to name this aggregator something else.

bestpath(V) = [V | bestpath(predecessor(V))].

This type of backchaining is not supported yet -- don't have mixed chaining. Backchaining (BC) can only depend on BC -- It's not allowed to look at any forward chained (FC) stuff. FC can call BC routines. In other words BC is in a lower stratum than FC.

timvieira commented 11 years ago

Btw, I'm waiting for feedback before committing these changes. Thanks.

jeisner commented 11 years ago

@timvieira Your proposal is very sensible and straightforward, but some discussion here.

First, a comment about tiebreaking for argmin=/argmax= no matter how we do the syntax: I take your point that using the second arg to break ties in the value is intuitive and gives predictable behavior. However, I am still inclined not to make any guarantee about how ties in the value are broken! It should be up to the system, just like ?= (although we may make some kind of stability guarantee). This is for efficiency:

Second, let's talk about the syntax. Your example is fine except that it has this redundancy:

path(B) min= path(A) + edge(A,B).

for the "forward pass," and

b(V) argmin= [path(U) + edge(U,V), U].

for the backward pass. So the aggregand from the forward pass is repeated in the backward pass. This aggregand is short enough that one might not mind too much, but in longer cases like CKY it will start to get annoying and error-prone. Also, if the forward pass has multiple aggregands on multiple lines, then there's even more to repeat.

This is why I was suggesting a sugared argmin= method that combined the two to avoid the redundancy:

[path(V), b(V)] argmin= [path(U) + edge(U,V), U].  

This corresponds pretty directly to what one would code by hand, i.e., the single combined block

if (path(V) > path(U) + edge(U,V))
then { path(V) = path(U) + edge(U,V); b(V) = U }

which you can think of as aggregating into the ordered pair [path(V),b(V)]. Remember that I regard the pair notation here as essentially just sugar for what you wrote (@nwf seems to be objecting on an email thread; I'll try to respond there soon).

(For the record, in my original proposal, I wrote the above as min= rather than argmin= with the idea of overloading min= to work on pairs, and I swapped the order of the elements of the pair. These are just cosmetic issues, and your more conservative approach to them may be better -- we can discuss.)

jeisner commented 11 years ago

p.s. An more general approach that dates back to Dyna 1 days would be to automatically define all backpointers for all rules. For example, from

path(V) min= path(U) + edge(U,V).   

we derive something like this (note: it really also needs a rule index, to deal with the possibility of multiple identical aggregands from different rules):

$rule(&(path(V) min= Aggregand)) = *Aggregand for Aggregand = &(&path(U) + &edge(U,V)).

For example, if path("BWI")+edge("BWI","DTW") has a non-null value, then the instantiated rule $rule(path("DTW") min= path("BWI")+edge("BWI","DTW")) would exist, and its value would be that value.

One could then follow the optimal backpointers via

b(V) ?= U for path(V) == $rule(&(&path(V) min= &(&path(U) + _))).

Eventually, we probably do want to make all instantiated rules available in some such way, and not only for max=/min= rules either. This allows traversing the hypergraph freely for analysis of parse forests, visualization, etc., etc. But this is advanced programming that we don't support now (because we don't have prefix * yet) and that we shouldn't expect naive users to comprehend.

jeisner commented 11 years ago

@nwf sent this by email. Reposting here on his behalf.

As a different suggestion that might be just as convenient, let me propose that we define argmax= and argmin= which take lists like this but have a "side-effect" in the chart of setting an argm/1 item. That is, if I write

a(1,2) := 3.
a(1,4) := 5.
foo(X) argmin= [a(X,Y), Y]

then the chart contains

foo(1) => 5.
argm(foo(1)) => 4.

This requires the same kind of machinery as I suggested before, with the aggregator fold() function dispatching emit() calls, but I think it slightly less hackish -- there's no need to case analyse the head and behave appropriately, just a different aggregator, and there's no need to worry about improper quantification tradeoff, since the argm/1 item is exactly as quantified as the primal rule.

jeisner commented 11 years ago

@nwf The above is a cute design, thanks, and I think we should probably go with some version of it.
It is a more concise version of the following destructuring definition (per #37):

[foo(X), argm(foo(X)] argmin= [a(X,Y), Y].

in which the name argm(foo(X)) is specified implicitly. (Obviously, argm is declared to not evaluate its arguments.)

I really like the fact that something sensible gets defined automatically. When I tried to do that above by automatically defining $rule, my definitions were ugly as sin because they were too generic and because they were defined off min= Value with the corresponding Arg being specified elsewhere. The crucial thing about your design is that the user writes argmin= [Value,Arg] all in one rule, which is much cleaner. The only downside is that to make my CKY program follow backpointers, I have to change all the min= to argmin=.

You say that this is "slightly less hackish" than the destructuring definitions in #37. I think it is a bit more hackish from the viewpoint of language definition, since it involves some kind of weird special handling for argmin= that reaches into the head of the rule and makes up a new item. Presumably this special handling doesn't happen when argmin= is used as a prefix operator, i.e., one could also write the above as

mypair(X) = (argmin= [a(X,Y), Y]).   % no special handling, result is now a pair

foo(X) = mypair(X).arg(0).
argm(foo(X)) = mypair(X).arg(1).

Given that you have to reach into the head and do special stuff for argmin=, you could handle it differently if the head is a pair. So I wonder if we should (eventually?) allow both your

foo(X) argmin= [a(X,Y), Y].

and my

[foo(X), my_argm(X)] argmin=  [a(X,Y), Y].

where the latter lets me call it my_argm(X) if I don't like the default name argm(foo(X)). (If we have the destructuring definitions from #37, then this handles the second form automatically -- then I'm just saying that the special handling kicks in only when the head is an item, in which case it converts the first form to the second form (and thereby avoids a type failure). That is even a reasonably transparent explanation for users.)

[Added 7/4/13: If you want to block the special handling and actually define foo(X) to be the pair, you'd have to write something like foo(X) = (argmin= [a(X,Y), Y]). Awkward.]

Discussion of naming:

I'm a little conflicted about the aggregator name argmin=. What we're doing is a bit more general than that. The usual notion of argmin is "the function argument X that minimizes f(X)." But our version of argmin= lets us find "the &g(X) such that X minimizes f(X)". It's peculiar to call the result an "argument" to anything, since &g(X) is not what's being passed to f! (Unless we bend over backwards and rephrase our description as "the G of the form g(X) such that G minimizes f(G.arg(0))".)

That is why I previously suggested overloading min= to work on [value,arg] pairs. Although I suppose we may want to instead reserve min= on pairs and other tuples to work lexicographically; that could come in handy.

Even if we do keep argmin=, I see that you've carefully chosen the name argm, but I'm not sure whether it's a good choice:

Other ideas for names?

jeisner commented 11 years ago

[from @timvieira] Will my version of argmax be enough for you lecture tomorrow?

[from @jeisner] What's the current version of argmax?

[from @timviera] The code I post on the ticket is full-functional and should be available.

Glad to hear it.
So, as I said above, what concerns me about the current design is the redundancy where you basically have to write all your rules twice. How hard would it be to switch to either @nwf's argm design or my destructuring design? Both of these crucially eliminate the redundancy.

As I wrote above, I think we should go with "some version" of @nwf's argm design, possibly (eventually) treated as syntactic sugar on top of my destructuring design, so that both versions are supported and so is prefix aggregation.

My only real question about @nwf's design was whether we want to use the names argmin=/argmax=/argm or can come up with something better. For example, we might use $bp (for backpointer) instead of argm, or even some bit of punctuation (a prefix operator that is spiritually similar to the adjoint operator).

Or we could do my destructuring design if you think it's easier. But it probably isn't, since as @nwf points out, it requires a general destructuring definition facility (#37) that is slightly tricky.

timvieira commented 11 years ago

Nwf's thing is pretty nice I've whipped up a version of it. Will push soon. Argmax/argmin is definitely the wrong name. On Jul 4, 2013 4:11 AM, "Jason Eisner" notifications@github.com wrote:

[from @timvieira https://github.com/timvieira] Will my version of argmax be enough for you lecture tomorrow? [from @jeisner https://github.com/jeisner] What's the current version of argmax? [from @timviera] The code I post on the ticket is full-functional and should be available.

Glad to hear it.

So, as I said above, what concerns me about the current design is the redundancy where you basically have to write all your rules twice. How hard would it be to switch to either @nwf https://github.com/nwf's argm design or my destructuring design? Both of these crucially eliminate the redundancy.

As I wrote above, I think we should go with "some version" of @nwfhttps://github.com/nwf's argm design, possibly (eventually) treated as syntactic sugar on top of my destructuring design, so that both versions are supported and so is prefix aggregation.

My only real question about @nwf https://github.com/nwf's design was whether we want to use the names argmin=/argmax=/argm or can come up with something better. For example, we might use $bp (for backpointer) instead of argm, or even some bit of punctuation (a prefix operator that is spiritually similar to the adjoint operator).

Or we could do my destructuring design if you think it's easier. But it probably isn't, since as @nwf https://github.com/nwf points out, it requires a general destructuring definition facility (#37https://github.com/nwf/dyna/issues/37) that is slightly tricky.

— Reply to this email directly or view it on GitHubhttps://github.com/nwf/dyna/issues/29#issuecomment-20464360 .

jeisner commented 11 years ago

Awesome. What's your thinking about names? @nwf, feel free to chime in.

timvieira commented 11 years ago

Best I've got is max2= On Jul 4, 2013 10:03 PM, "Jason Eisner" notifications@github.com wrote:

Awesome. What's your thinking about names? @nwf https://github.com/nwf, feel free to chime in.

— Reply to this email directly or view it on GitHubhttps://github.com/nwf/dyna/issues/29#issuecomment-20498601 .

jeisner commented 11 years ago

A proposal:

It seems to me that the pair is best described as a [val,key] pair, so we could use $key instead of argm. I wondered what y'all thought about overloading max= so that it could take [val,key] pairs, rather than introducing a new name max2=. In fact it wouldn't even have to be overloading. Here is a heterogenous case:

x max= [f(X,Y), X].
x max= g.

Now if x gets its winning value from f(3,4), then $key(x) would be 3. But if x gets its winning value from g, then $key(x) would be null. So you can think of max= g as essentially being how you write max= [g,null] since you can't write null.

Thus, if all of your aggregands have the one argument form, then max= acts as always and $key(x) never gets defined. And if all of your aggregands have the two-argument form, then it acts like what we've been calling argmax=.

timvieira commented 11 years ago

I don't like this. I'd expect x to be a list (if the first rule wins). Pretty awkward.

I prefer $key or argm

timvieira commented 11 years ago

We can call it maxWithKey=

On Thu, Jul 4, 2013 at 10:37 PM, Tim Vieira tim.f.vieira@gmail.com wrote:

I don't like this. I'd expect x to be a list (if the first rule wins). Pretty awkward.

I prefer $key or argm

timvieira commented 11 years ago

It's called argmin= in the interm use $key don't forget to quote the argument. Used in examples/ptb.dyna and examples/dijkstra-backpointers.dyna

To run ptb (it's not actually the ptb yet) $ ./dyna examples/ptb.dyna --load 'tree = sexpr("test/repl/english.par")' -i

jeisner commented 11 years ago

Thanks for implementing! I now have a proposed change to the syntax that might be workable.

don't forget to quote the argument

Hmm, I was assuming that $key autoquoted its argument. Do we have a dispos declaration or other special sauce for that? That would be handy.

We can call it maxWithKey=

Urgh. Too verbose for something that will be used all the time. And too hard to change max= to maxWithKey= given that I think such changes will be common. You know, first you write the max= version of the program and then you put the backpointers in.

One advantage to using max= in both cases is that you don't have to change the aggregator when you do this. The other advantage is that it gives you an option for the key to be null as explained in my comment above; I have an instinct that this might be useful.

I don't like this. I'd expect x to be a list (if the first rule wins).

Well, you might expect that even if the operator were called argmax=. Either way, the arguments are lists, and internally the aggregator is producing a list (which you can see when using it as a prefix operator). It's just that we pull some special handling tomfoolery to assign that list's elements to x and $key(x) instead of assigning the list as a whole to x.

But I think you're saying that when the operator is called max= you expect to be comparing entire lists with >. Really? What is the ordering of lists under >? (Lexicographic?) It seems to me that using a list is just a way of giving an extra argument (the key).

However, I am not wedded to using a list to give this extra argument -- we could specify the key in some other way. For example, something verbose and dedicated like this:

x max= f(X,Y) with key X.   % defines $key(x) to be X

which also gives a hint that $key(&x) will be X since "key" is used in both places. The other advantage to this is that it makes the line easier to parse visually. In examples/ptb.dyna, I find that the fact that the key is itself a list makes these lines hard to eyeball:

phrase(X,I,K) argmax= [phrase(Y,I,J) * phrase(Z,J,K) * p(X,Y,Z), [[Y,I,J], [Z,J,K]]].
phrase(X,I,K) argmax= [phrase(Y,I,K) * p(X,Y), [[Y,I,K]]].
phrase(X,I,I+1) argmax= [1, X] for [I,X] in enumerate(sentence).

and maybe it would be easier as

phrase(X,I,K) max= phrase(Y,I,J) * phrase(Z,J,K) * p(X,Y,Z) with key [[Y,I,J], [Z,J,K]].
phrase(X,I,K) max= phrase(Y,I,K) * p(X,Y) with key [[Y,I,K]].
phrase(X,I,I+1) max= 1 with key X for [I,X] in enumerate(sentence).

which simplifies the aggregator to max=, removes the outermost [ ] , and makes it easy to identify that there is a key and where the key is. I think this looks pretty good!

Syntactically, with key is an infix pairing operator that binds loosely (just as for does). So a sample aggregand would be 0.000123 with key [[&np,3,5],[&vp,5,7]]. The result of aggregation would also have this form.

Remark: If "with key" seems too much like COBOL and we wanted something briefer, a bit of punctuation would do, preferably related punctuation in both places. Something like

x max= f(X,Y) :: X.    % defines ::x to be X. 

But I'm getting kind of attached to something verbal that breaks up the line.

Below is how my earlier proposals play out with this syntax, just to show that we could do it in future.


Allowing unspecified (null) keys:

x max= f(X,Y) with key X.
x max= g.

Now $key(&x) is either defined or not, depending on who won.

We don't have to support this. The alternative is to say that max= barfs an $error if it tries to aggregate two items and one has a key and the other doesn't. (But eventually, I think it's nicer to do something reasonable when there is a reasonable interpretation.)


Prefix aggregator:

mypair = (max= f(X,Y) with key X).

Now mypair gets a value like f(3,4) with key 3.


User-specified key name (requires destructuring definition facility #37):

x with key k  max=  f(X,Y) with key X.

Now x gets a value f(3,4) and k gets a value 3. The common form x max= ... is desugared into x with key $key(&x) max= .... But this desugaring is suppressed if the head is already a with key term.

You might wonder what happens if we do

x with key k   max=  f(X,Y) with key X.
x with key k2  max=  g with key 0.

According to the email I just sent on the destructuring definitions thread, we end up with two different and unrelated argmaxes each with its own result. In other words, these two lines are separate aggregations. If both are non-null, then x will be in $error. However, if it's impossible for both f and g to be defined at the same time, then we're just fine: x will get a value properly, and at most one of k and k2 will get defined.

I suppose that under this destructuring story,

x max= f(X,Y) with key X.
x max= g.

must desugar into

x with key $key(&x) max= f(X,Y) with key X.
x with key $key(&x) max= g with key $null.

The explicit $null in the second body ensures that the result of aggregation will be either something like f(3,4) with key 3 or g with key $null. It therefore always matches the with key form that is expected by the head.

Note that writing the following is usually a bad idea. It desugars into something analogous to the k/k2 example above:

x with key k   max= f(X,Y) with key X.
x     max= g.    % implicit key of $key(&x)
timvieira commented 11 years ago

don't forget to quote the argument

Hmm, I was assuming that $key autoquoted its argument. Do we have a dispos declaration or other special sauce for that? That would be handy.

Yup. Will fix. In the mean time, you can use :- dispos *$key(&).

We can call it maxWithKey=

Urgh. Too verbose for something that will be used all the time.

Agreed.

One advantage to using max= in both cases is that you don't have to change the aggregator when you do this. The other advantage is that it gives you an option for the key to be null as explained in my comment above; I have an instinct that this might be useful.

I don't like this. I'd expect x to be a list (if the first rule wins).

Well, you might expect that even if the operator were called argmax=. Either way, the arguments are lists, and internally the aggregator is producing a list (which you can see when using it as a prefix operator). It's just that we pull some special handling tomfoolery to assign that list's elements to x and $key(x) instead of assigning the list as a whole to x.

But I think you're saying that when the operator is called max= you expect to be comparing entire lists with >. Really? What is the ordering of lists under >? (Lexicographic?)

Yup. I rely of lex ordering of tuples all the time.

It seems to me that using a list is just a way of giving an extra argument (the key).

Yes. It's a means to an ends in this case. I'd much rather not use a list.

However, I am not wedded to using a list to give this extra argument --

we could specify the key in some other way. For example, something verbose and dedicated like this:

x max= f(X,Y) with key X. % defines $key(x) to be X

which simplifies the aggregator to max=, removes the outermost [ ] , and makes it easy to identify that there is a key and where the key is. I think this looks pretty good!

This is nice!

Syntactically, with key is an infix pairing operator that binds loosely (just as for does). So a sample aggregand would be 0.000123 with key [[&np,3,5],[&vp,5,7]]. The result of aggregation would also have this form.

For the time being we should name this with_key or 'with key' (using quotes) so that we don't have to hack this into the parser.

Ok, so this creates item 'with key'(0.000123, [[&np,3,5],[&vp,5,7]]) which has all of the ordering operation defined (so max, min, <=, etc work). When an item with one of these guys as a value pops: the key and value get processed separately: value goes to the head and the key goes to $key(&head). (Eventually, we'll use a program transform to do this more efficiently.)

And just to be clear:

> x = 0.000123 with_key [[&np,3,5],[&vp,5,7]].

Does not result in x = &'with key'(0.000123, [[&np,3,5],[&vp,5,7]]). It results in

> sol
x = 0.000123.
$key(x) = [[&np,3,5],[&vp,5,7]].

x max= f(X,Y) :: X. % defines ::x to be X. This one is fine too, I like with key. Users can alias :: and with key to mean the same thing.

Allowing unspecified (null) keys:

Yup, much nicer to not barf. null is the right thing here.

_Prefix aggregator & _User-specified key name (requires destructuring definition facility #37 https://github.com/nwf/dyna/issues/37):

Seems simple enough!

timvieira commented 11 years ago

I'll willing to bet that head destructuring will eventually replace with_key. Since we have a ticket for that I'm going to close this issue. If we find that with_key isn't satisfactory, we can reopen it.

jeisner commented 11 years ago

Thanks for implementing this: I'll put it (and :-) on the parsing assignment! I agree with closing this ticket. But what do you mean by "head destructuring will eventually replace with_key"?

(My proposal was that we have both and they work together. Desugaring turns the head and the body of max= into with_key terms unless the user has already given them that form, and then max= aggregates the with_key terms from the body and delivers them to the with_key head via destructuring definition.) (Of course, for efficiency, we can compile to plain vanilla max= where we can statically tell that the key will always be null.)

timvieira commented 11 years ago

It seems like the with_key trick is subsumed by head destructing. By eventually replace, I mean that users are likely to prefer to only learn to use one of them (probably the more powerful one: for example, Pythonistas use list comprehension over list function such as map, filter, cross production, etc because they are more general and readable).

jeisner commented 11 years ago

But how would you even write this aggregation without the with_key keyword?

g max= f(X,Y) with_key X.

or for that matter

g with_key k   max=  f(X,Y) with_key X.

In the proposed design, you still need that keyword, because max= treats it specially. The rest of the design just happens to be implemented as a special case of head destructuring. Head destructuring is a more general facility that lets you write things like

[real,imag] *= [real(N),imag(N)].   % product of complex numbers

but I thought max= worked specifically with pairs of the form Val with_key Key, so that keyword can't go away.

On Sat, Jul 6, 2013 at 1:27 PM, Tim Vieira notifications@github.com wrote:

It seems like the with_key trick is subsumed by head destructing. By eventually replace, I mean that users are likely to prefer to only learn to use one of them (probably the more powerful one: for example, Pythonistas use list comprehension over list function such as map, filter, cross production, etc because they are more general and readable).

— Reply to this email directly or view it on GitHubhttps://github.com/nwf/dyna/issues/29#issuecomment-20557731 .

timvieira commented 11 years ago

It's not a "keyword" it's a term 'with_key'(f(X,Y), X).

Right now, max= doesn't do anything special. The agenda loop does all of the special handling -- when an update to an item has a with_key value the update forks "destructures" into an update to g and an update to $key(f(X,Y)). This is exactly what the following destructured head syntax (DHS) does:

[g, $key(f(X,Y))] max= [f(X,Y), X].

With DHS, max= will need to treat the second argument specially because it's aggregated with = (or ?= if we don't deterministically break ties.

Are you saying that max= needs to know that the value is a with_key instance in order to fork the update correctly (e.g, if it were a list the 2nd arg in the head would be aggregated with max= instead of =)?

jeisner commented 11 years ago

It's not a "keyword" it's a term 'with_key'(f(X,Y), X).

Right, it's syntactically an infix functor. By "keyword" I meant that it is hard-coded to have a special meaning or to trigger special behavior, not that it has a special syntax.

Are you saying that max= needs to know that the value is a with_key instance in order to fork the update correctly (e.g, if it were a list the 2nd arg in the head would be aggregated with max= instead of =)?

Are you asking what would happen with something like [foo,bar] max= [f(X),b(X)].? That would just compare the lists using <= (however that's defined on lists), yielding whichever list is considered to be maximal, and assigning its elements to foo and bar via DHS. That is not the same as writing foo max= f(X). bar max= b(X). separately.

In your design, if I understand correctly, the agenda loop breaks off the key part and only hands the value to the aggregator. I'll discuss that later in this comment; we can talk about which design is better.

Jason's design (as already discussed upthread)

My design was that the with_key part really does get handed in a normal way to the aggregator or indeed to other operators. To be concrete, here was my intended definition of the binary max operator. You can see that it does know about with_key.
(You can consider the fragment below to be pseudocode that happens to be written in Dyna; I don't care whether we actually implement it in Dyna.)

% not commutative, so the key that wins is order-dependent for max, and implementation-dependent for max=.
% we could also get weird stuff happening if <= isn't a total ordering.

V max V2 := V maxPlain V2.  
V max V2 := V maxWithKey V2.   % this definition overrides whenever it applies

V maxPlain V2 := V.
V maxPlain V2 := V2 for V <= V2.  % override.  .

(V with_key K) maxWithKey (V2 with_key K2) := V with_key K.
(V with_key K) maxWithKey (V2 with_key K2) := V2 with_key K2 for V <= V2.   % override

Note that maxPlain and maxWithKey could be defined more concisely by using the immediate if operator, e.g.,

V maxPlain V2 := if (V <= V2) then V2 else V.

The point is that now max can take two (val,key) pairs and return one of them. This design allows the user to use keys when writing infix binary max and prefix max=, not only with the usual binary max= that appears in rules.
(Your design would have to be extended a bit to allow infix binary max -- you'd have to break off the with_key part of any argument to any operator, not just arguments to an aggregator.)

Notice that the values can be any type of item on which <= is defined. The user is free to define <= on lists or other compound terms or dynabases. ( Remark: I defined max in terms of <= rather than < because it is more likely to be defined -- since defining < is not enough to define a partial order with ties -- and if <= was not initially defined because there were no ties, then it can be easily defined from < and ==.)

What should happen if we try to take the max of a keyed value and a plain value, (V with_key K) max V2 where V <= V2? Reasonable results for that case include

What is the right thing to happen in that case? Well, this is strongly analogous to coercion of ints to reals. Once we have some static typing, we can probably say that just as there are two kinds of + or += (one produces an int from ints, the other produces a real from reals and so coerces any int argument to real), we also have two kinds of max: maxPlain produces a val from vals, and maxWithKey produces a (val,key) pair from (val,key) pairs. If we can distinguish statically which one is being used, then we can keep them separate, and replace maxWithKey with a version that coerces any input V to V with_key $null unless it already has that form:

VK maxWithKeyCoercing VK2 = addkey(VK) max addKey(VK2).
addkey(V) := V with_key $null.
addkey(VK) := VK if (VK = _ with_key _).  % override

Then (V with_key K) max V2 can return V2 with_key $null in our example above, so that null keys can be omitted.

I like this design because it guarantees a consistent type for the result of aggregation, regardless of what the aggregands are. E.g., it says that in this case

foo  max=  this.
foo  max=  that.
foo  max=  -999999 with_key "Inconceivable!" for very_unlikely_condition.

we can statically identify the aggregator as being maxWithKeyCoercing=. This guarantees that the result will be a with_key term (one of this with_key $null, that with_key $null, tinynum with_key "Inconceivable!"), even in the common cases where the third rule provides no aggregand or a losing aggregand.

But until we have static typing that can split max= into maxPlain and maxWithKeyCoercing, $error seems a better choice to ensure that foo has a consistent type (although this means that null keys unfortunately cannot be omitted). The user can explicitly write

foo  max=  this with_key $null.
foo  max=  that with_key $null.
foo  max=  -999999 with_key "Inconceivable!" for very_unlikely_condition.

Now let's talk specifically about infix max=, which is where the magically defined $key items come in. Of course a with_key result can be assigned to a with_key head using DHS:

val with_key key = 3 with_key 4.

so we can write

val with_key key  max=  f(X) with_key X.

But what happens if the user just writes this, omitting the key?

val  max=  f(X) with_key X.

We want it be interpreted as

val with_key $key(val)  max=  f(X) with_key X.

Upthread, I was saying that we need some special syntactic sugar that fills in a missing key for infix max=. If we could distinguish between maxPlain= and maxWithKeyCoercing=, then we could fill in a key for the latter case only. As I proposed upthread, until we have static typing or other heuristics to identify cases of maxPlain=, we should always be conservative and interpret infix max= as meaning maxWithKeyCoercing=, which goes along with filling a default key into the head.

Important: We wouldn't want to always interpret prefix max= or binary max as maxWithKeyCoercing, because then something like (3 max 8)+1 would fail: it would turn into (8 with_key $null)+1. But in the case of infix max=, we are safe, because we can guarantee via desugaring that the head has with_key form so that DHS will work! In short, we can always slip a key item into the head, whether we need it or not, because we are also slipping keys into the aggregands, whether we need them or not.

For now, prefix max= and binary max should continue to use the heterogenous definition that I gave at the top of this comment, which allows both kinds of aggregation but gives an $error if they're mixed. Once we can do static typing, we can make that be no longer an error (and be more efficient, too).

Tim's design (currently implemented)

Ok, let's move on to your design. You write:

Right now, max= doesn't do anything special. The agenda loop does all of the special handling -- when an update to an item has a with_key value the update forks "destructures" into an update to g and an update to $key(f(X,Y)).

Hmm, so you're saying that you have handled this in a way that doesn't need to know which aggregator is being used? Let me guess: is the idea that during aggregation, you update $key to the current key whenever g changes (or possibly, whenever it changes to equal the current aggregand)?

I guess that's a general principle that covers both max= and min=, but I can't figure out whether it's useful beyond those two aggregators. Let's see: with +=, the resulting value of $key would come from an arbitrary nonzero aggregand if any (in practice the last one), else from a zero aggregand. And with mean=, since the internal value of g is probably a (sum,count) pair, it is changed by any aggregand, so the resulting value of $key would just come from an arbitrary aggregand.

Ok, I can see that it would be useful for other aggregators that pick a single aggregand. One example is ?=. Another is a random sampling aggregator that chooses among non-negative real aggregands in proportion to their value. By attaching a key to each aggregand, you would get

uprob  rand=  uprob(X)  with_key  X.   
randval  =  $key(uprob).

although I am worried that your particular implementation wouldn't get this right in the case of two X's with the same uprob(X). Or maybe not right at all because the natural implementation of rand= (by eviction) has to maintain a running total (like mean=) that changes with every nonzero aggregand, and not just a current selection. In any case, I would find it significantly more convenient to turn rand= around and write it using with_val rather than with_key, i.e.,

randval  rand=  X  with_val  uprob(X).

which also has the nice property that an omitted val can default to 1, making uniform sampling a snap.

The big questions

So in summary, let's say that with_key is useful for some aggregators like max=,min=,?=. Do we want to follow Tim's idea of implementing it in a general way that also makes it do something (weird) for other aggregators like +=? Or do we want to follow Jason's idea of having each aggregator decide whether and how to handle it?

Are there going to be other cases like with_key in which extra values get defined and may or may not get consumed? (For example, with_val as above?)

If so, do we need anything like Lisp's multiple-values feature? (I had previously assumed not, since Lisp uses tis feature largely to get efficiencies that Dyna gets for free -- e.g., instead of having separate quotient and remainder functions, the quotient function just returns the remainder as an extra value, which shares work. In Dyna, we can give names to the shared items and then they won't have to be computed twice. However, sometimes it's quite nice to have anonymous expressions, such as the aggregands and result of max=, and sometimes such expressions are compound terms where the parts don't have names. Dropping the extra parts when you use it in a context that only wants the simplest part -- seems like a handy bit of sugar. It's basically coercing an object to a simpler type, as opposed to the more typical coercions that embed an object in a more complicated type (e.g., int to real).

jeisner commented 11 years ago

Sorry, I hit something that made that comment post before I was done with it. Let me fix it before you read it.

Ok, you can read it now. :-)

jeisner commented 11 years ago

Syntactic problems with current with_key implementation are noted on #15. It should evaluate its second argument, and should be able to take an unparenthesized expressions for that argument.