Closed GoogleCodeExporter closed 8 years ago
The reason that 'const c = c;' succeeds is that the function value (closure)
'c' matches itself, whereas 'd' is a different closure and thus doesn't match
'c'. You can verify this:
> c===c;
1
> c===d;
0
Note that matching of literals (which is done here because 'c' is nonfix) only
ever verifies syntactical identities (as defined by '===' / 'same').
Anything else would be "matching modulo equational theories" which you can find
in some languages and theorem provers, but Pure doesn't do that. And that's
actually a good thing, too. Matching modulo equational theories can be horribly
inefficient, and in this specific case (equivalence of functions) it's even
undecidable.
> I swear I'm not deliberately testing all the pathologies I can think of.
Please by all means do! :) But in this case it's not a bug, it works correctly
as designed, so I'm closing this issue now. If I left anything unclear, I'd
suggest to continue the discussion on the mailing list.
Original comment by aggraef@gmail.com
on 17 Jun 2012 at 10:15
[deleted comment]
[Deleted an earlier version of this comment in order to explain more clearly.]
Sure, I realize the difficulties recognizing the equivalence of c and d. I
think I didn't convey accurately what was puzzling here. I'm not expecting that
any pattern which should match c should also match d.
What was tripping me up was that both 'c and c were matching the pattern, and
yet they're not === each other. I thought, if 'c matches the pattern then the
pattern must consist of the literal symbol c. Then c (the closure that symbol
is bound to) shouldn't match the pattern, just as d also doesn't match it. I
guess it was distracting to bring in d in that way.
Here's a range of cases. Only the nonfixes with a binding of type fun behave
strangely:
> run
> const c = [1]; let d = [2]; f y = y + 1; // const, var, fun
> nonfix c d f; // all nonfix
> bar x = case x of c = 100; d = 101; f = 102; _ = 200 end;
> show bar
bar x = case x of [1] = 100; d = 101; f = 102; _ = 200 end;
> bar (c), bar ('c); bar (d), bar ('d); bar (f), bar ('f);
100,200 // because c is a nonfix const, its value is inlined into the pattern,
and c's value matches the pattern that results; 'c does not
200,101 // because d is a nonfix variable, just the literal symbol d is
contributed to the pattern, and d's value doesn't match that but 'd does
102,102 // curiously, whatever the nonfix function f contributes to the
pattern, is matched both by f's value and by 'f, even though:
> f === 'f;
0
And yet f is not contributing a mere variable, either, as evidenced by the fact
that 'c doesn't match it.
I would have expected at most one of f and 'f to match the pattern f (since
they aren't syntactically identical to each other). And since f is not a const,
I would expect it to be contributing the symbol f to the pattern, so it should
be 'f that matches, just as 'd matches the pattern d.
Fine to move discussion to the list if you prefer.
Original comment by dubious...@gmail.com
on 18 Jun 2012 at 4:14
> What was tripping me up was that both 'c and c were matching the pattern, and
yet they're not === each other.
You're right, according to the explanation I gave, 'c shouldn't match c, but it
does. It seems that, in contrast to 'same', pattern-matching only looks at the
symbol of a closure. I probably did it that way so that a quoted constructor
term can be matched against a corresponding pattern even if the constructor
symbol happens to have a definition as a function.
It's a corner case, and there's no obvious right thing here. But in any case,
it's unfortunate that 'same' and matching work differently here. I'll have to
think about whether there is a good reason for this, so I'm reopening the issue.
Original comment by aggraef@gmail.com
on 18 Jun 2012 at 6:45
Our discussion of this has made me see it's a more general phenomenon. It
applies to identifiers in head positions in patterns as well as to nonfixes in
patterns.
> run
> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2);
100
> foo (('max) 2);
100
> foo ('max 2);
100
> max 2 === ('max) 2;
0
> max 2 === 'max 2;
0
> map (typep function) [(max 2),(('max) 2), ('max 2)];
[1,0,0]
Original comment by dubious...@gmail.com
on 18 Jun 2012 at 12:18
And:
> run
> foo x = case x of f@(max a) = f 3; _ = 0 end;
> foo (max 1); foo (max 5);
3
5
> foo ('max 1); foo ('max 5);
max 1 3
max 5 3
Original comment by dubious...@gmail.com
on 18 Jun 2012 at 12:23
There is a crucial difference here with the way anonymous closures behave.
> run
> succ1 x = x + 1;
> let succ2 = \x -> x + 1;
> foo x = case x of ::succ1 = 1; ::succ2 = 2; _ = 0 end; // the explicit
namespace is another way to suppress the interpretation of these identifiers as
pattern variables, additional to their being nonfix or in a head position
> foo succ1, foo ('succ1);
1,1 // as we've already seen
> foo succ2, foo ('succ2);
0,2 // but anonymous closure values don't match identifiers that happen to be
bound to them
It's not surprising for the introspective facilities to distinguish the named
closures from the anonymous ones. But this makes their difference intrude into
ordinary programming in a way I didn't expect. I expected to generally be
indifferent between defining a (non-recursive) closure with a rule and defining
it with a lambda. (Issues of extensibility aside.) But this shows that the
resulting function values pattern match differently.
Original comment by dubious...@gmail.com
on 18 Jun 2012 at 1:53
And the story goes on. :) Here's another dubious example:
> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2) with max x y = x,y end;
100
In contrast:
> bar x = case x of f@_ a = 100 if f === max; _ = 200 end;
> bar (max 2) with max x y = x,y end;
200
So one might argue that the pattern matcher should compare symbols the same way
that 'same' does. Unfortunately this isn't possible. If a symbol occurs on the
lhs of an equation, it's simply a symbol there, not a closure. The pattern
matcher doesn't know or care about the bindings of the symbol which might be in
effect at the definition site at all.
And that's actually a good thing, at least with global bindings. Consider:
> foo (bar x) = x;
> bar==='bar;
1
> bar x::int = x+1;
> bar==='bar;
0
At the point where foo is defined, bar is just an undefined symbol, so the
pattern matcher should treat bar and 'bar alike, since they're the same. But
the subsequent definition of bar changes this. Should foo suddenly change
accordingly? I don't think so.
So I'd rather leave pattern matching as it is. The question is whether 'same'
should follow suit, i.e., simply ignore the closure that a symbol may be bound
to. However, it's clearly a good thing to have some way to distinguish a named
closure from its quoted symbol, or two different local closures which happen to
have the same name.
So I tend to just leave things the way they are now, but clearly document these
discrepancies in the "Caveats and Notes" section of the manual. (The case of
local closures might call for some more scrutiny, though.) Do you agree?
Original comment by aggraef@gmail.com
on 19 Jun 2012 at 7:52
Re comment 7: It may be awkward, but succ1 and succ2 are really different
things in Pure. Even if you take the weakest reasonable syntactic equality
notion, "print out the same in the interpreter," succ2 and 'succ2 are clearly
not the same in Pure. This continues to hold even if you define succ2 like this:
succ2 = succ1;
So the discrepancies you see there have nothing to do with the pattern matcher
or lambdas, they will hold under any reasonable notion of syntactical equality.
Original comment by aggraef@gmail.com
on 19 Jun 2012 at 8:19
I think some examples I was just playing with are like yours in 8:
> makef x y = f with f z = x + y + z end;
> let g,h=makef 1, makef 2;
> bar x = case x of f _ = 100; _ = 200 end;
> bar (g 3), bar (h 3); // notice these aren't matched
200,200
> bar (f 3) with f x y z = x + y + z end; // this is like your example in
comment 8
100
I certainly agree that in your foo/bar case that foo shouldn't change its
behavior just because bar has been subsequently bound. Given also my
observation (in the new wiki page) that programs like this:
> case 1,2 of a,b = a end;
seem to rely on the existing behavior, changing the pattern matcher to
eliminate this doesn't look so smart.
I do agree that === should be left as is, too. So I think I agree it looks best
to just leave things as they are but document the behavior.
However, I'd suggest a slightly more thoroughgoing re-documentation than just a
note in the Caveats section. Currently there are various places in the
documents that suggest that certain tokens in a pattern stand for literal
symbols. (Nonfixes are mentioned several times but really this is true for all
operators, also identifiers in head position or with namespace qualifiers.) I
think it would help make this behavior less surprising if you didn't say that.
Rather, say that those tokens match either the corresponding symbols, or named
functions bound to those symbols.
One just has to get the description right, in light of the cases in your
comment 8 and this comment.
Original comment by dubious...@gmail.com
on 19 Jun 2012 at 2:26
Perhaps it might also be nice to make this:
> foo x = x+1;
> typep fun foo, typep symbol foo, typep symbol ('foo);
return 1,0,1 instead of 1,1,1 as presently.
Original comment by dubious...@gmail.com
on 19 Jun 2012 at 4:39
Re comment 10:
> bar (g 3), bar (h 3); // notice these aren't matched
200,200
Did you expect something different? Note that g and h are bound to partial
applications of makef here, not f.
Considering the use of the notion "literal" in the documentation: Note that
"literal symbol" doesn't mean a quoted symbol, the documentation uses this
notion to describe a symbol in a pattern which stands for itself rather than
denoting a variable. I really think that "literal" is the most appropriate term
to use here. But maybe a more careful definition of all the involved terms
(literal, symbol, function, etc.) is in order.
Re comment 11: Apparently you only want to call a symbol a symbol if it's
quoted or unbound? I disagree. The way that the term "symbol" is used in Pure
comes from term rewriting theory. There are function symbols (this also
includes the operator and nonfix symbols in Pure) and variable symbols in term
rewriting. The symbol type encompasses them all. So foo *is* a symbol, that
doesn't change just because it has a closure bound to it.
I think that much of the confusion here actually comes from the wedding of
notions from term rewriting and functional programming. It's hard to avoid this
in a language which provides both. But I'll try my best. ;-)
Original comment by aggraef@gmail.com
on 20 Jun 2012 at 12:18
"Given also my observation (in the new wiki page) that programs like this:
> case 1,2 of a,b = a end;
seem to rely on the existing behavior, changing the pattern matcher to
eliminate this doesn't look so smart."
No, this would most likely break everything. Pondering this some more, I think
that having a literal function symbol foo matching both foo and 'foo is the
right thing in Pure, even though they're not really the same. The fact that one
has a closure attached to it and the other doesn't is really immaterial from
the point of view of term rewriting.
However, I'm still thinking about the examples involving local functions, such
as the one I posted in comment 8:
> foo x = case x of max a = 100; _ = 200 end;
> foo (max 2) with max x y = x,y end;
100
I can't help it, this just looks wrong. I must admit that I never though about
this case before. Maybe it would be better to maintain the illusion that there
are local symbols (not just local closures) so that the pattern matcher can
consider the 'max' in the second rule different from the 'max' in the first
rule and reject that match. That seems to be the right thing to me. But maybe
we should discuss this on the mailing list so that others can chime in.
Original comment by aggraef@gmail.com
on 20 Jun 2012 at 12:47
I'm inclined to agree with you about all this.
And you're right, I screwed up the makef test. If we make it `makef x = f with
f y z = x+y+z end`, as we should for the desired test, then `g 3` and `h 3` do
end up matching.
Re "symbol": so `quote succ` and `succ` are just two syntactically distinct
symbols? This is a new way of talking for me, but I can get used to it.
But let's document that more explicitly. My thinking was: the symbol type
matches those expression values named "symbols" in Pure's syntax. And these are
what `quote succ` and unbound symbols evaluate to. The former expression
evaluates to a named function, which I assumed was not a symbol. After all it
can't be bound to any values. But it might also be counted an expression since
it evaluates to something, namely itself.
You're right, there probably are subtle mismatches between different
theoretical toolboxes making themelves manifest here. And I'm bringing in yet a
different set of tools/background assumptions. It's not obviously impossible to
find a way of talking that navigates this all cleanly. But yeah, it is work.
Original comment by dubious...@gmail.com
on 20 Jun 2012 at 1:31
Ok, the people on the mailing list have voted for keeping things as they are.
I've also revised the Pattern section in the manual to include a note on the
issues raised here. So I'm closing this issue report now. Thanks for the report
and the discussion, it was really helpful to get it sorted out and clearly
understand the issues!
Original comment by aggraef@gmail.com
on 23 Jun 2012 at 10:01
Original issue reported on code.google.com by
dubious...@gmail.com
on 17 Jun 2012 at 6:41