nwf / dyna

Dyna2 compiler and REPL
GNU Affero General Public License v3.0
147 stars 20 forks source link

implement || and && (and other special forms involving booleans) #34

Open jeisner opened 11 years ago

jeisner commented 11 years ago

| and & are strict boolean operators. (No, not bitwise integer operators, I don't think.) Like other normal operators, they return null if they have any null argument, and otherwise $error if they have any $error argument.

But we also need lenient versions that will treat null arguments sort of like false (three-valued logic). I just fell into the trap myself by using a strict operator in

contains(Sentence, brown(Sentence,Position)).
query contains(S,"who") | contains(S,"what")

This returned only sentences S that contained both "who" and "what", because the | expression fails if either argument is null. I should have written the query as

query contains(S,"who") || contains(S,"what")

so that null values would not cause the expression to fail.

|| and ||= behave like this:

And && and &&= are the same but upside-down:

However, nulls are never passed to aggregators (an aggregator receives a set of non-null values). Hence the test "if any arguments are null" cannot succeed with &&=, which only returns null if all of its arguments are null.

Thus, ||= is just like |= except that true beats $error rather than vice-versa. And &&= is just like &= except that false beats $error rather than vice-versa. The fact that errors can be masked is similar to the way that lazy || and && are often used in C and its descendants, e.g.,

if (eof() || getchar()==' ') ...
if (!eof() && getchar==' ') ...

except that our version doesn't care about the order of the arguments.

Note that && is a normal operator in that null arguments cause it to fail and return null. However, || is crucially not normal in this way! The right way to implement || is to first define ||= and then say that foo || bar is syntactic sugar for the temp item defined by

temp ||= foo.
temp ||= bar.

[5/29/17: Or I think we could just say that foo || bar is the same as (||= foo; bar), couldn't we? Note again that foo && bar can't be written in this way. These are however the same as (|= foo; bar) and foo & bar except that those have stricter error handling in this design, and it's not totally clear that we ever really want that strict error handling.]

jeisner commented 11 years ago

p.s. I really like this design and this presentation of it. I got close to it in inference2.tex, where I played around with a couple of variants. But this version feels like the One True Path.

timvieira commented 11 years ago

Priority?

On Tue, Jul 2, 2013 at 12:04 AM, Jason Eisner notifications@github.comwrote:

p.s. I really like this design and this presentation of it. I got close to it in inference2.tex, where I played around with a couple of variants. But this version feels like the One True Path.

— Reply to this email directly or view it on GitHubhttps://github.com/nwf/dyna/issues/34#issuecomment-20325236 .

jeisner commented 11 years ago

The right way to implement || is ...

Or, just define it as a built-in with non-standard behavior. We already have ?foo as such a non-standard built-in: like foo || bar, it doesn't necessarily fail on null arguments. Another non-standard built-in is expression-if, if x then y else z (perhaps also expressible as x ? y : z), which is equivalent to

temp = y if x.
temp = z if !x.
jeisner commented 11 years ago

Priority?

How hard is it to implement? It would be nice for the tutorial, and for giving students more leeway to play around without getting into trouble the way I did. (We'd tell them about the lenient versions and not about the strict versions.) However, I am not sure it is crucial for anything in the LI class.

nwf commented 11 years ago

This is really not easy to fix. Our ANF has obliterated whatever structure we had, so "foo && bar" is now "F is foo, B is bar, R is F && B, R", and that of course turns into loops which bail out if either foo or bar is null. Null is not plumbed through the expression as a value in its own right and is not available to primitives. This has, AFAIK, always been our understanding of expressions -- null is an annihilator; thus ? and other "N+1-state operators" operators need to avoid evaluation within the current rule.

Thus, we would need to define &&/2 and ||/2 to quote their arguments (just as with ?/1, which we don't have either yet, for the same reason; see below) and then write (or inline)

X && Y := false.
X && Y := true for *X & *Y.

making use of indirect evaluation, which we cannot do right now. There's a reserved word in DOpAMine for it, but that's about as far as we've gotten down this path.

(?/1's definition is ?(X) := false. ?(X) := true for _ is *X. Note that for _ is X does not work, since variables never autoevaluate, so is placing its rhs in such a context does nobody any good.)

nwf commented 11 years ago

(There's also something of a mixed bag here for our future type system -- presumably we are required to pass only terms which are statically known to be evaluable to such operators? Or maybe we just have to statically know whether the arguments are evaluable and can then inline the definition so that we do not have to actually do the evaluation? Dynamic dispatch seems like it would require some interesting encoding tricks in the run-time system.)

jeisner commented 11 years ago

So let's wait until we are farther along on indirect evaluation (or special forms) until we try this. The alternative, as I mentioned, is to transform || out of the program and replace it with a temporary variable aggregated with ||=. But that would screw up the way that we report expressions involving || in queries.
I like your more principled approach better, namely for || to quote its arguments and then we can define

(X || Y) ||= *X.
(X || Y) ||= *Y.

(@nwf, it is || that is the hard one. Your translation of && is wrong. In fact && is identical to & except for error handling.)

jeisner commented 11 years ago

Other operators with special handling of nulls and errors

The if-then-else operator. The "branch not taken" is allowed to be null or $error without corrupting the branch that is taken. I think we should pull the same trick as with || above, and define

(if X then Y else Z) = *Y for X.
(if X then Y else Z) = *Z for !X.

where the if-then-else operator suppresses autoevaluation of its Y and Z arguments. Note that this definition piggybacks on the comma definition below (via for which is sugar for comma with args reversed).

Using the comma definition below implies that if X itself is $error, then the result is always $error unless Y and Z are both null. It is conceivable, however, that we should change this so that if Y and Z have the same value, then the result is that value even if X is in error.

Note that this apparently equivalent definition wouldn't work, because then (if $error then 0=1 else 123) will give 123 instead of $error:

(if X then Y else Z) := *Z.
(if X then Y else Z) := *Y for X.

Remark: The simpler (if X then Y) (assuming we allow it) should be interpreted as just the first rule above, or equivalently as (if X then Y else 0=1). This makes it yet another synonym for X,Y or Y for X.

The = and := aggregators. Remember that if the winning aggregand is $null then we say that the result is null. This is different from the other cases on this ticket, because it is about null output rather than null input.

Other operators with special handling of errors (but normal handling of nulls)

Annihilations? We've considered the possibility that 0 * $error should give 0, that infinity max $error should give infinity, etc.

Certain aggregations.

The comma operator. The comma operator is left-to-right lenient with respect to errors:

false,$error just returns null because we never get to the error value. (For example, foo(X) = X > 0, 1/X is explicitly guarding the division to avoid an error, and shouldn't be poisoned by the error that would result from actually doing the division.)

But conversely, $error,X always returns $error (because we "don't know" whether the test succeeded or not). ... except when X is null in which case I think it returns null (so that if an aggregand X doesn't even exist we don't have to worry about whether its guards evaluate sensibly).

So comma is not commutative because false,$error is more lenient than $error,false. In this way, comma differs from && (in which false always beats $error) as well as from & (in which $error always beats false).

(In fact, it is a lot like left-to-right && in C, Perl, etc.)

A comma sequence A,B,C,D,E should be parsed as (A,(B,(C,(D,E)))). This turns out to be equivalent to (A && B && C && D), E, which is good because it means that it treats null and false equivalently in the guards, as we expect. That is, it behaves like this:

So we see that comma is not associative either. We just saw that A,B,C means A,(B,C) which is equivalent to (A && B),C which is not equivalent to (A,B),C. Specifically, $error,false,123 is null, but ($error,false),123 gives $error.

(What's happening with ($error,false),123 is that we "don't know" the value of the guard ($error,false) -- it comes out as $error which pollutes the whole thing. We've "lost track" of the fact that (X,false) is always going to be false or null so that it will nix the 123 regardless -- once it's in error, it might as well be true for all we know. This does interfere with optimization, or a future fine-grained error system where if (X,false) is tightly typed as having type :false, then ($error,false) gives an error value of type :false which might be good enough to annihilate the whole expression ($error,false),123.)

jeisner commented 11 years ago

Remark: Some versions of LISP used basically the trick that we are using here: that some functions (defined with nlambda instead of lambda) suppress the usual automatic evaluation of their arguments, but are free to call eval on them.

Common Lisp instead decided to distinguish between functions and macros. It does have a few "special forms" -- lists (func arg1 arg2 ...) that are not evaluated by the usual mechanism of evaluating all of arg1, arg2, etc. and calling the function named by func on the list of results. These special forms are the cases where func is one of the symbols quote, if, catch, let, declare, and a number of others. All other special evaluation stuff in Common Lisp is done, often in terms of these special forms, via macros (which we arguably don't need in Dyna because we don't need to manage side effects or inlining). From the Common Lisp manual:

The set of special forms is fixed in Common Lisp; no way is provided for the user to define more. The user can create new syntactic constructs, however, by defining macros.
The set of special forms Common Lisp is purposely kept very small because any program-analyzing program must have special knowledge about every type of special form. Such a program needs no special knowledge about macros because it is simple to expand the macro and operate on the result expansion. It goes on to say that as a matter of implementation convenience or efficiency, compilers may want to change which forms are handled as special forms vs. macros, provided that other program transformers and analyzers can rely on the definition given here. This is discussed carefully.