hughperman / pure-lang

Automatically exported from code.google.com/p/pure-lang
0 stars 0 forks source link

`const c = c` behaves unexpectedly #80

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I swear I'm not deliberately testing all the pathologies I can think of. I'm 
just trying to verify the general statements I'm making as I'm putting together 
some notes/proposed wiki material.

In this case:
> c x = x + 1;
> d x = x + 1;
> nonfix c;
> const c = 1; // we'd expect this to fail, because c is now a literal in the 
lhs pattern, and 1 doesn't match it; and it does fail

> const c = 'c; // we'd expect this to succeed, without conflicting with the 
existing binding for c; and indeed that's what happens

> const c = d; // I'd also expect this to fail; and it does fail

> const c = c; // but I'd expect *this* to fail, too, because c and d have the 
same kind of value, why should that value match the pattern in the one case but 
not the other? In fact this declaration succeeds silently, just as const c = 'c 
does.

Obviously this is no high priority to fix. But I wonder whether the reported 
behavior can really be (cleanly and simply and clearly) motivated. If not, and 
it's not difficult, maybe it'd be better to make `const c = c` fail in this 
case, just like `const c = d` does?

Though the only gain I envisage is just making it easier to explain the 
principles behind what's happening.

What version of the product are you using? On what operating system?
Pure 0.53 on FreeBSD

Original issue reported on code.google.com by dubious...@gmail.com on 17 Jun 2012 at 6:41

GoogleCodeExporter commented 8 years ago
The reason that 'const c = c;' succeeds is that the function value (closure) 
'c' matches itself, whereas 'd' is a different closure and thus doesn't match 
'c'. You can verify this:

> c===c;
1
> c===d;
0

Note that matching of literals (which is done here because 'c' is nonfix) only 
ever verifies syntactical identities (as defined by '===' / 'same').

Anything else would be "matching modulo equational theories" which you can find 
in some languages and theorem provers, but Pure doesn't do that. And that's 
actually a good thing, too. Matching modulo equational theories can be horribly 
inefficient, and in this specific case (equivalence of functions) it's even 
undecidable.

> I swear I'm not deliberately testing all the pathologies I can think of.

Please by all means do! :) But in this case it's not a bug, it works correctly 
as designed, so I'm closing this issue now. If I left anything unclear, I'd 
suggest to continue the discussion on the mailing list.

Original comment by aggraef@gmail.com on 17 Jun 2012 at 10:15

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[Deleted an earlier version of this comment in order to explain more clearly.]

Sure, I realize the difficulties recognizing the equivalence of c and d. I 
think I didn't convey accurately what was puzzling here. I'm not expecting that 
any pattern which should match c should also match d.

What was tripping me up was that both 'c and c were matching the pattern, and 
yet they're not === each other. I thought, if 'c matches the pattern then the 
pattern must consist of the literal symbol c. Then c (the closure that symbol 
is bound to) shouldn't match the pattern, just as d also doesn't match it. I 
guess it was distracting to bring in d in that way.

Here's a range of cases. Only the nonfixes with a binding of type fun behave 
strangely:
> run
> const c = [1]; let d = [2]; f y = y + 1; // const, var, fun
> nonfix c d f; // all nonfix

> bar x = case x of c = 100; d = 101; f = 102;  _ = 200 end;
> show bar
bar x = case x of [1] = 100; d = 101; f = 102; _ = 200 end;

> bar (c), bar ('c); bar (d), bar ('d); bar (f), bar ('f);
100,200 // because c is a nonfix const, its value is inlined into the pattern, 
and c's value matches the pattern that results; 'c does not
200,101 // because d is a nonfix variable, just the literal symbol d is 
contributed to the pattern, and d's value doesn't match that but 'd does
102,102 // curiously, whatever the nonfix function f contributes to the 
pattern, is matched both by f's value and by 'f, even though:
> f === 'f;
0
And yet f is not contributing a mere variable, either, as evidenced by the fact 
that 'c doesn't match it.

I would have expected at most one of f and 'f to match the pattern f (since 
they aren't syntactically identical to each other). And since f is not a const, 
I would expect it to be contributing the symbol f to the pattern, so it should 
be 'f that matches, just as 'd matches the pattern d.

Fine to move discussion to the list if you prefer.

Original comment by dubious...@gmail.com on 18 Jun 2012 at 4:14

GoogleCodeExporter commented 8 years ago
> What was tripping me up was that both 'c and c were matching the pattern, and 
yet they're not === each other.

You're right, according to the explanation I gave, 'c shouldn't match c, but it 
does. It seems that, in contrast to 'same', pattern-matching only looks at the 
symbol of a closure. I probably did it that way so that a quoted constructor 
term can be matched against a corresponding pattern even if the constructor 
symbol happens to have a definition as a function.

It's a corner case, and there's no obvious right thing here. But in any case, 
it's unfortunate that 'same' and matching work differently here. I'll have to 
think about whether there is a good reason for this, so I'm reopening the issue.

Original comment by aggraef@gmail.com on 18 Jun 2012 at 6:45

GoogleCodeExporter commented 8 years ago
Our discussion of this has made me see it's a more general phenomenon. It 
applies to identifiers in head positions in patterns as well as to nonfixes in 
patterns.

> run
> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2);
100
> foo (('max) 2);
100
> foo ('max 2);
100
> max 2 === ('max) 2;
0
> max 2 === 'max 2;
0
> map (typep function) [(max 2),(('max) 2), ('max 2)];
[1,0,0]

Original comment by dubious...@gmail.com on 18 Jun 2012 at 12:18

GoogleCodeExporter commented 8 years ago
And:
> run
> foo x = case x of f@(max a) = f 3; _ = 0 end;
> foo (max 1); foo (max 5);
3
5
> foo ('max 1); foo ('max 5);
max 1 3
max 5 3

Original comment by dubious...@gmail.com on 18 Jun 2012 at 12:23

GoogleCodeExporter commented 8 years ago
There is a crucial difference here with the way anonymous closures behave.

> run
> succ1 x = x + 1;
> let succ2 = \x -> x + 1;
> foo x = case x of ::succ1 = 1; ::succ2 = 2; _ = 0 end; // the explicit 
namespace is another way to suppress the interpretation of these identifiers as 
pattern variables, additional to their being nonfix or in a head position
> foo succ1, foo ('succ1);
1,1 // as we've already seen
> foo succ2, foo ('succ2);
0,2 // but anonymous closure values don't match identifiers that happen to be 
bound to them

It's not surprising for the introspective facilities to distinguish the named 
closures from the anonymous ones. But this makes their difference intrude into 
ordinary programming in a way I didn't expect. I expected to generally be 
indifferent between defining a (non-recursive) closure with a rule and defining 
it with a lambda. (Issues of extensibility aside.) But this shows that the 
resulting function values pattern match differently.

Original comment by dubious...@gmail.com on 18 Jun 2012 at 1:53

GoogleCodeExporter commented 8 years ago
And the story goes on. :) Here's another dubious example:

> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2) with max x y = x,y end;
100

In contrast:

> bar x = case x of f@_ a = 100 if f === max; _ = 200 end;
> bar (max 2) with max x y = x,y end;
200

So one might argue that the pattern matcher should compare symbols the same way 
that 'same' does. Unfortunately this isn't possible. If a symbol occurs on the 
lhs of an equation, it's simply a symbol there, not a closure. The pattern 
matcher doesn't know or care about the bindings of the symbol which might be in 
effect at the definition site at all.

And that's actually a good thing, at least with global bindings. Consider:

> foo (bar x) = x;
> bar==='bar;
1
> bar x::int = x+1;
> bar==='bar;
0

At the point where foo is defined, bar is just an undefined symbol, so the 
pattern matcher should treat bar and 'bar alike, since they're the same. But 
the subsequent definition of bar changes this. Should foo suddenly change 
accordingly? I don't think so.

So I'd rather leave pattern matching as it is. The question is whether 'same' 
should follow suit, i.e., simply ignore the closure that a symbol may be bound 
to. However, it's clearly a good thing to have some way to distinguish a named 
closure from its quoted symbol, or two different local closures which happen to 
have the same name.

So I tend to just leave things the way they are now, but clearly document these 
discrepancies in the "Caveats and Notes" section of the manual. (The case of 
local closures might call for some more scrutiny, though.) Do you agree?

Original comment by aggraef@gmail.com on 19 Jun 2012 at 7:52

GoogleCodeExporter commented 8 years ago
Re comment 7: It may be awkward, but succ1 and succ2 are really different 
things in Pure. Even if you take the weakest reasonable syntactic equality 
notion, "print out the same in the interpreter," succ2 and 'succ2 are clearly 
not the same in Pure. This continues to hold even if you define succ2 like this:

succ2 = succ1;

So the discrepancies you see there have nothing to do with the pattern matcher 
or lambdas, they will hold under any reasonable notion of syntactical equality.

Original comment by aggraef@gmail.com on 19 Jun 2012 at 8:19

GoogleCodeExporter commented 8 years ago
I think some examples I was just playing with are like yours in 8:

> makef x y = f with f z = x + y + z end;
> let g,h=makef 1, makef 2;
> bar x = case x of f _ = 100; _ = 200 end;
> bar (g 3), bar (h 3); // notice these aren't matched
200,200
> bar (f 3) with f x y z = x + y + z end; // this is like your example in 
comment 8
100

I certainly agree that in your foo/bar case that foo shouldn't change its 
behavior just because bar has been subsequently bound. Given also my 
observation (in the new wiki page) that programs like this:

> case 1,2 of a,b = a end;

seem to rely on the existing behavior, changing the pattern matcher to 
eliminate this doesn't look so smart.

I do agree that === should be left as is, too. So I think I agree it looks best 
to just leave things as they are but document the behavior.

However, I'd suggest a slightly more thoroughgoing re-documentation than just a 
note in the Caveats section. Currently there are various places in the 
documents that suggest that certain tokens in a pattern stand for literal 
symbols. (Nonfixes are mentioned several times but really this is true for all 
operators, also identifiers in head position or with namespace qualifiers.) I 
think it would help make this behavior less surprising if you didn't say that. 
Rather, say that those tokens match either the corresponding symbols, or named 
functions bound to those symbols.

One just has to get the description right, in light of the cases in your 
comment 8 and this comment.

Original comment by dubious...@gmail.com on 19 Jun 2012 at 2:26

GoogleCodeExporter commented 8 years ago
Perhaps it might also be nice to make this:

> foo x = x+1;
> typep fun foo, typep symbol foo, typep symbol ('foo);

return 1,0,1 instead of 1,1,1 as presently.

Original comment by dubious...@gmail.com on 19 Jun 2012 at 4:39

GoogleCodeExporter commented 8 years ago
Re comment 10:

> bar (g 3), bar (h 3); // notice these aren't matched
200,200

Did you expect something different? Note that g and h are bound to partial 
applications of makef here, not f.

Considering the use of the notion "literal" in the documentation: Note that 
"literal symbol" doesn't mean a quoted symbol, the documentation uses this 
notion to describe a symbol in a pattern which stands for itself rather than 
denoting a variable. I really think that "literal" is the most appropriate term 
to use here. But maybe a more careful definition of all the involved terms 
(literal, symbol, function, etc.) is in order.

Re comment 11: Apparently you only want to call a symbol a symbol if it's 
quoted or unbound? I disagree. The way that the term "symbol" is used in Pure 
comes from term rewriting theory. There are function symbols (this also 
includes the operator and nonfix symbols in Pure) and variable symbols in term 
rewriting. The symbol type encompasses them all. So foo *is* a symbol, that 
doesn't change just because it has a closure bound to it.

I think that much of the confusion here actually comes from the wedding of 
notions from term rewriting and functional programming. It's hard to avoid this 
in a language which provides both. But I'll try my best. ;-)

Original comment by aggraef@gmail.com on 20 Jun 2012 at 12:18

GoogleCodeExporter commented 8 years ago
"Given also my observation (in the new wiki page) that programs like this:

> case 1,2 of a,b = a end;

seem to rely on the existing behavior, changing the pattern matcher to 
eliminate this doesn't look so smart."

No, this would most likely break everything. Pondering this some more, I think 
that having a literal function symbol foo matching both foo and 'foo is the 
right thing in Pure, even though they're not really the same. The fact that one 
has a closure attached to it and the other doesn't is really immaterial from 
the point of view of term rewriting.

However, I'm still thinking about the examples involving local functions, such 
as the one I posted in comment 8:

> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2) with max x y = x,y end;
100

I can't help it, this just looks wrong. I must admit that I never though about 
this case before. Maybe it would be better to maintain the illusion that there 
are local symbols (not just local closures) so that the pattern matcher can 
consider the 'max' in the second rule different from the 'max' in the first 
rule and reject that match. That seems to be the right thing to me. But maybe 
we should discuss this on the mailing list so that others can chime in.

Original comment by aggraef@gmail.com on 20 Jun 2012 at 12:47

GoogleCodeExporter commented 8 years ago
I'm inclined to agree with you about all this.

And you're right, I screwed up the makef test. If we make it `makef x = f with 
f y z = x+y+z end`, as we should for the desired test, then `g 3` and `h 3` do 
end up matching.

Re "symbol": so `quote succ` and `succ` are just two syntactically distinct 
symbols? This is a new way of talking for me, but I can get used to it.

But let's document that more explicitly. My thinking was: the symbol type 
matches those expression values named "symbols" in Pure's syntax. And these are 
what `quote succ` and unbound symbols evaluate to. The former expression 
evaluates to a named function, which I assumed was not a symbol. After all it 
can't be bound to any values. But it might also be counted an expression since 
it evaluates to something, namely itself.

You're right, there probably are subtle mismatches between different 
theoretical toolboxes making themelves manifest here. And I'm bringing in yet a 
different set of tools/background assumptions. It's not obviously impossible to 
find a way of talking that navigates this all cleanly. But yeah, it is work.

Original comment by dubious...@gmail.com on 20 Jun 2012 at 1:31

GoogleCodeExporter commented 8 years ago
Ok, the people on the mailing list have voted for keeping things as they are. 
I've also revised the Pattern section in the manual to include a note on the 
issues raised here. So I'm closing this issue report now. Thanks for the report 
and the discussion, it was really helpful to get it sorted out and clearly 
understand the issues!

Original comment by aggraef@gmail.com on 23 Jun 2012 at 10:01