true and false shouldn't be parsed as functors

jeisner commented 11 years ago

true and false should be parsed specially as literals, just like 123 would be. They are not 0-ary functors.

Of course, you can make any string into a functor (this is convenient when inventing functors from the header line of a TSV file, for example). So you can do so with these strings too, but you'd have to quote them.

'true'(foo,bar)
'false'(foo,bar)
'123'(foo,bar)

At the moment, they seem to be functors. This means that the boolean true unifies with the 0-ary structure 'true', which it shouldn't, just as the integer 123 shouldn't unify with the 0-ary structure '123'. It also means that trace sometimes prints &true (which is as weird as printing &123) -- see an example on #1.

See also #48, but this is a different issue.

nwf commented 11 years ago

That's moderately surprising to me; I thought true and false were just the two atoms that happened to be included in the set :bool. Other than the printout issue (which arises because the printer and ANF stages have slightly different ideas of quotation rules; that is, it's more of a bug than a semantic issue), can you articulate your reasons for this preference?

jeisner commented 11 years ago

That's moderately surprising to me; I thought true and false were just the two atoms that happened to be included in the set :bool

They're certainly the two values that happen to be included in the set :bool, but that doesn't mean they have to be atoms.

(In the same way, we could define :bit to be the set that contains the two integers 0 and 1, but those aren't atoms either.)

can you articulate your reasons for this preference?

(1) By analogy with other built-in literals such as numbers, strings, and dynabases, which are also disjoint from atoms. Booleans are sufficiently special, common, and built-in that they should get this treatment too for consistency.

(2) We have been careful not to pollute the functor namespace with our own keywords. That is why things like $key and $gensym and $priority and $rule start with $ -- essentially a special space in which we can expand our keyword set as much as we like.

Hence, the user or the TSV importer can use arbitrary strings as functors without fear of collision, as long as they don't start with $ (which would have to be escaped, I guess).

I don't see a reason to complicate that simple principle by making additional exceptions for true and false, when there's another very reasonable design that doesn't have those exceptions. (Nor do I want to call them $true and $false: that's fugly.)

jeisner commented 11 years ago

I don't see a reason to complicate that simple principle

On the other hand, I admit that it complicates Prolog's simple principle that identifiers starting with a lowercase character don't need to be quoted. So, you push it down there and it pops up here. (But not as badly, because at least we have the escape hatch of just quoting all functors. For a collision within a namespace, there's not much recourse.)

nwf commented 11 years ago

Should lists not use nil/0 and cons/2 but rather $nil/0 and $cons/2, too?

jeisner commented 11 years ago

Should lists not use nil/0 and cons/2 but rather $nil/0 and $cons/2, too?

Sounds right.

nwf commented 11 years ago

This is really only a problem for /0 atoms, right? In particular, the fact that we're co-opting initial-lower-alpha-alphanum-strings (ETA: without parens) for for/2 in/2 is/2 new/1 whenever/2 with_key/2 (and maybe others I've left out) is not a problem because these always occur where it's syntactically "obvious" that they are operators?

jeisner commented 11 years ago

This is really only a problem for /0 atoms, right? In particular, the fact that we're co-opting initial-lower-alpha-alphanum-strings (ETA: without parens) for for/2 in/2 is/2 new/1 whenever/2 with_key/2 (and maybe others I've left out) is not a problem because these always occur where it's syntactically "obvious" that they are operators?

Well, a parser is entitled to eat up whatever characters it wants and interpret them however it wants. Special keywords don't necessarily have to map onto functors of the same name.

But most of the things you're listing here, as well as infix + and other operators, do in fact map onto functors that build compound terms whose values can be defined, don't they? And which can be &-quoted?

I wanted to say that by contrast, true is not a functor that builds a term that may have a value. Our integer and string literals that are distinct from the space of compound terms. Why not the same for booleans? It doesn't make sense to &-quote or evaluate 3 or "three", so why would it make sense to &-quote or evaluate true?

Fundamentally I think my objection is that compound terms can serve as the names of items. I am afraid that if true is just an atom whose dispos says not to auto-evaluate it, then *true is null unless someone mischievously defines a value for true, in which case *true could be 17. This seems like it's just asking for trouble. It's an error to write *3 (I think): that is not an expression with result null, nor can anyone define it to be 17.

I guess there is an alternative design where *3 is legal to write, but is unalterably defined to be 3, and *true is unalterably defined to be true. (See further discussion of that design below.) That gets around the objection above, but I'd still prefer booleans to be disjoint from compound terms just as integers are. It's not that we couldn't embed boolean literals in the space of terms. People are likely to do many similar embeddings: e.g., define a bunch of colors &red,&blue, ... or perhaps &color(&red), ... or &color("red"), ... And they could also make those colors do double-duty as unevaluated data and as the names of items. But then they are consciously allocating certain functors for that purpose, whereas the booleans (like the integers) are very basic equipment as I said above, and with many built-in constructions that treat them specially.

Being able to treat expressions like X+Y as quotable compound terms is important if we are going to be able to manipulate code as data. (It's also very handy to be able to quote arithmetic expressions as data so that we can multiply and differentiate polynomials, etc., although we would want a quote-all-the-way down operator for convenience.)

Now, there's a question of exactly what terms are getting built by the functors you ask about:

The simplest story is that &(3+4) and &'+'(3,4) are identical. It's true that this is an incursion on the space of functors of the sort that I was decrying above, since the compound term &'+'(3,4) has a built-in value of 7 -- at least, unless the user prevents that definition of + from being imported (via a declaration or an overriding definition).

So maybe I should just get the bee out of my bonnet and say "fine, in the same way, the atom term true has a built-in value of $true" or ⊤ or something. But in that case, we'd always print it as $true or ⊤ rather than true, which seems ugly and is likely to confuse users ("is it okay to type true? what's the difference between that and the thing that gets printed out?"). Again by analogy, we don't say that 3 is an atom with value $int("3") or [insert representation of Platonic threeness here].

It's also possible that the builtins are not an incursion on the space of functors. We've discussed an import mechanism in which importing functors from another dynabase is really just sugar, as in many languages. It doesn't cause new items to be defined in the current dynabase. Rather, importing cos from the math dynabase means that cos(0) is interpreted as if it were math.cos(0), where the value of math is some dynabase μ.

Assuming that this desugaring (like desugaring of infix operators) happens before quoting, then -- insofar as I can reconstruct our discussions of evaluation -- the functor of &cos(z) or equivalently &(math.cos(z)) is not the string "cos" bur rather the pair (μ,"cos"). Or perhaps an anonymous version of that, representing the function exported from μ under the name "cos" (but the user can't recover μ or "cos" from it). The idea is essentially that math and the lookup that I'll sloppily call μ.cos still get to evaluate and so does z, but the application of the former to the latter is quoted. I don't know whether it's possible to &-quote the "μ.cos" step or to write or &-quote constructions like math.X where X=&cos(0) (maybe there are special operators for this, like math->X) -- I don't remember how far we got on this design.

But anyway, under that design, 3+4 is short for something like builtin.(3+4), and quoting it gives a functor that is not "+" but something more like (β,"+").

When I said that perhaps the builtins are not an incursion on the space of functors, what I meant was that the user might still have some way to construct or receive terms with the ordinary "cos" and "+" functors. for example, by single-quoting them. In that case, the story is that 'cos'(0) is not the same as cos(0) because it blocks the import sugar (seems reasonable) and '+'(3,4) is not the same as 3+4 (so if you want to write + in prefix form, you'd have to do builtin.'+'(3,4)).

And in fact, we could say that true is imported to mean builtin.true (which is defined within builtin to evaluate to itself), whereas if the user writes 'true' then they are getting their local true. That is basically the design I'm suggesting on this ticket in the first place. So it would be a way to keep true within the space of compound terms as you'd prefer, while keeping it disjoint from ordinary compound user terms. This is a nicely extensible design. For example, if in some future version wanted to create a new type of complex numbers that act disjoint from the user's terms, we can make builtin.complex(Real,Imag), which doesn't collide at all with any definitions complex(...) that a user might have already created, as long as the user's definitions block importing.

I wrote above:

I guess there is an alternative design where *3 is legal to write, but is unalterably defined to be 3, and *true is unalterably defined to be true.

In general I haven't liked that design in part because it means that the user's dynabase is full of junk. Dumping the current dynabase contents with query X would tell us a lot of junk like 3 = 3 and true = true, and we wouldn't be able to copy stuff from d up to the current dynabase by writing X := d.X (that would involve illegally redefining 3). It's not as bad if true (and 3?) are defined within builtin and syntactically imported rather than actually being part of the user's dynabase.

I wrote above:

But most of the things you're listing here, as well as infix + and other operators, do in fact map onto functors that build compound terms whose values can be defined, don't they? And which can be &-quoted?

In a few of the cases you list, I'm not sure what compound term should be built, i.e., I'm not sure what the result of &-quoting the construct should be. First, is the for keyword even legal syntax anywhere other than in the right hand side of a top level rule in a dynabase, so that it's okay to write Code=&(result for condition) and then evaluate *Code or unify Code=(Result for Condition)? I think so, but is the value of Code the term &'for'(result,condition) or does it desugar during parsing to &','(condition,result) so that the for is never seen? The former is better for error messages, and the latter makes program transformations simpler.

(Note: The latter is at least an option, since I think user-defined syntax, at least, should probably allow the underlying form of the functor to differ from the surface syntax, e.g., the user can introduce X -> Y,Z as sugar that is interchangeable with rewrites(X,Y,Z) rather than '->'(X,Y,Z). I'm not sure Prolog allows that. But if we don't have it, then prefix and postfix operators with the same spelling would have to map to the same functor, and it's not clear what functor we'd get for multifix operators like if ... then or { ... } (though conceivably something with spaces).)

I don't see a good reason to make the functor be $for rather than for, assuming that we're not doing $+ rather than + for arithmetic expressions. However, if we decide that the functor for 3+4 is not "+" but something more like (β,"+") where β is the builtin dynabase, then I suppose the functor of result for condition should be (β,"for") in the same way.

timvieira commented 11 years ago

The initial issue appears to be resolved. Can we close this ticket?

jeisner commented 11 years ago

Have nil and cons also become distinct from 'nil' and 'cons'? Above nwf suggested $nil and $cons, although I think he told me the other day that now they are just something internal that cannot be printed without list notation.

nwf commented 10 years ago

Whoop, I reverted this without thinking about it as part of the QPT branch, which did away with booleans-as-primitive-values. I'll put those back, I guess.

nwf / dyna

true and false shouldn't be parsed as functors #49