msp-strath / Mary

Mary is the successor of Marx, a content delivery and assessment engine based on markdown and git
17 stars 1 forks source link

Explicit Environments #41

Closed pigworker closed 4 years ago

pigworker commented 4 years ago

I started talking about explicit environments on #37 and on Slack. Let me be both broader and deeper in an issue of its own.

The idea is to introduce a crude but alarmingly effective form of what some might call "object orientation" if they wanted to try to wind us up: as a tactic, it wouldn't work on me, anyway.

Yer regular "mathematical" functions are, at least in a cbv language, an extreme form of contextualization. In f(a), f contextualizes a by waiting patiently for its value, and then postprocessing it. Meanwhile, back in dear old Pascal, one could write

with r do c;

where r is an expression of record type and c is a command for which r's fields are in scope. c could, of course, be return e, but it was a tad annoying that there was no version of with in the expression language, only in the command language. One of my main irritations (for it is superficial) with oo style, is all that projection all over the place like bad acne. I went through a period (ha ha) of pronouncing . as "spot" rather than "dot". IIRC, there's a version of this rant in my thesis. Anyhow, r contextualises c by allowing lookup as c runs, and quietly going away when c stops. It's pretty much the other extreme from functions, when it comes to what activities "contextualization" amounts to.

So I'm proposing an notion of first class environment e which allows us to write

e(a)

to contextualize the evaluation of a by the value bindings in e.

What is such an e? It's an association tree.

  1. The empty environment is []
  2. A singleton environment is ['var | value]
  3. A binary environment is [env0 | env1] where env0's bindings may shadow env1's.

So, we're exploiting the fact that atoms apart from [] are the quotations of identifiers. (Indeed, perhaps it is now stretching a point to call [] an atom, given that it does not quote an identifier, you can't write it with a quote-mark, etc. But it is, at least, genuinely indivisible, which quoted identifiers wouldn't be if were to allow them to explode as strings.) We can tell the difference between a singleton and a binary by checking their heads. We can thus crunch such a structure down to a Haskell map or a JS object, when we need to use it as an actual environment.

Why is it a tree and not a list? It's a monoid, so why bother normalising it? Also, you get to preserve sharing when you build bigger environments from smaller ones. Bargain basement inheritance is gained by the expedient of shoving old environments at the back end of new ones.

In the it-looks-like-C-but-it-so-isn't brutality of this language, I am sorely sorely tempted to make the notation

x = s

sugar for

['x | s]

so that we can (never let that mean assignment and also) write things like

[x = 5  y = 7  [z = 3]]

and yer common or garden

let x = s in t

becomes

(x = s)(t)

Around the corner, there's the option to consider pattern matching as "compute an explicit environment or abort", where our beloved -> is also concealing yet another use of application-as-contextualization.

It's cheap; it's fun; it's encapsulation for hooligans. It's the polite version of dynamic binding, just as we're already doing the polite version of delimited control.

Shall we do this?

pigworker commented 4 years ago

In the expression language, rather than the value language, we could indeed have

p = e

meaning the matching environment, with the possibility of an exception.

I think this is way cooler and more economical than the practical-joke-on-JSON option.

pigworker commented 4 years ago

I'm now bikeshedding the operator with myself. So join in.

I'm wondering whether the symmetrical = is appropriate for the asymmetry of pattern versus expression. Patterns are subject to unification, but perhaps & is the good notation for that.

I.e., not just "as-patterns", but p0 & p1 being the pattern which matches if and only if both p0 and p1 match, yielding [env0|env1].

I'm wondering whether p <- e might be better than p = e. But I don't think p0 = p1 is good notation for the pattern which unifies its two components, so there's no pressure on that front.

pigworker commented 4 years ago

Open for business on this branch.

pigworker commented 4 years ago

I haven't put it in the JS interpreter yet, but I did three things:

  1. parentheses for disambiguation are now permitted (and are about to be needed)
  2. p = e attempts to compute a matching environment in the form of an association list; p = e(a) means p = (e(a)); to get (p = e)(a), all those parens are needed
  3. applying environments captures the variables bound therein; in fact, anything built from [] or cells is treated as an environment

So local definition is now as follows:

(x = foo)(bar)

Now, this plays merry hell with scope checking. For the moment, things scope-checked as local or out of scope can be captured by an explicit environment, but things scope-checked as global cannot.

That creates the obvious problem that making new definitions in an imported module can change the meaning of a program by preventing an explicit environment from capturing and shadowing a name, e.g., the x, above.

We have options.

  1. Even if something is scope-checked as global, check dynamically to see if it's been shadowed anyway.
  2. Completely separate the namespaces of global things and local things.
  3. Sort out the type system for this language, or at least make the scope-checker spot when it's being given an explicit environment directly.

Note that I avoided extending the value type by writing code to marshal and unmarshal values and finite maps. In JS, I'll probably just have to start representing environments as association trees instead of JS objects. Again, big design space.

pigworker commented 4 years ago

I've done 4, above. All shortnamed variables can be shadowed.

pigworker commented 4 years ago

There are still bikeshed options for the notation. Alternatives to p = e might replace = by <- or := or <=. It might be a kindness to steer people away from the idea that what's happening is assignment.

Moreover, we should consider the option to flip things around and write e -> p or e => p or e =: p. There are a variety of positives here:

  1. It doesn't look loke something it's not.
  2. It keeps the flow of time left to right: the vars in p don't come into scope until after the e computes.
  3. It eliminates the ambiguity which requires the first parens in (p = e1)(e2).
  4. It simplfies parsing: when you're looking for an expression p = e, you sometimes have to go a long way before giving up on p being an expression.

I'm currently warming to e => p, but very much still open to other possibilities.

pigworker commented 4 years ago

Tangentially, the often insightful twitter haskeller, @ luqui, observes that we could reduce the weight of the notation by making ; right-associative unary application. That is,

e1 ; e2

means the same as

e1(e2)

Note that if e1 has the decency to return [], as any effect-only program should, we get the same semantics as before: [] is the empty environment. But if e1 yields an environment, e2 can see its stuff. Of course, that means we can write more joke C:

x = 5; y = [x]; x = 7; y    // returns [5]

This would either make it very good or very bad to retain p = e as the matching notation.

It also combines nicely with the idea that guarding by Booleans (numbers 0 or not) is contextualization.

assertion ; computation

aborts if assertion is false and gives the result of computation, otherwise.

We currently have a special kind of stack frame dedicated to being left of a semicolon. Adopting this proposal would replace it by yer ordinary AppL.

fredrikNordvallForsberg commented 4 years ago

One worry is that we might want to use <= for less-than-or-equal-to? The symbol => would still be okay, but perhaps too typo-prone?

One thing that confuses me is that I could well have imagined that the role of e and p were reversed in e -> p and e => p, if I think of the environment as a lookup table. Hence I would also like to propose p |-> e. I fear I might be missing something eg with respect to the discussion about unification of patterns above though; could you give us some non-singleton examples of notation for environments?

pigworker commented 4 years ago

We certainly do want <= for less-or-equal and >= for greater-or-equal.

Non-singleton examples include things like

solutions(problem) => [x | _] ; goWith(x)

Unification of patterns is a generalization of as-patterns, and is reducible to as-patterns. OK, we need the "grumpy" pattern, $#!£, which refuses to match anything. We have

[x [y z]]@[[a b] c] = [x@[a b] c@[x y]]

and such like.

I worry that p |-> e looks more like this input maps to that output. How do we signal "e is computed, then p is defined"?

gallais commented 4 years ago

How do we signal "e is computed, then p is defined"?

Could we also reuse as-patterns but on the RHS? p@e would mean considering e as a p-like thing.

pigworker commented 4 years ago

Whilst p@e is unambiguous, it's pretty scary: p can itself be x@p. Reading left-to-right, you need to find the last @.

I've also just realised that p <something> e means we have to decide whether p <something> e1 ; e2 means (p <something> e1) ; e2 or p <something> (e1 ; e2). The former, I think, as it's the semantics people expect for x = 5; foo(x), and also one can rather more naturally write e1; p <something> e2 for the latter. It does mean the parser needs a little more subtlety.

e <something> p does not have the same problem.

I think the = sign should appear somewhere in the <something>. If the something is just an equal sign, then it's p = e, and as you were. I could be persuaded to p := e or e =: p. I am persuaded to avoid things which look like arrows.

pigworker commented 4 years ago

OK, here's where I think we are with this:

  1. Looks like p := e is winning the notation competition. I'll need to modify the parser...
  2. ...so that p := f(e) means p := (f(e)) but p := e1; e2 means (p := e1); e2.
  3. I've already implemented ; as right-nested unary application.
  4. Full disclosure. I also did 0(-) aborts, when n is a non-zero number n(e) is e. So b; e does guarding if b is Boolean.
  5. The JS interpreter needs to be brought into line with the above.
  6. There should probably be some tests and stuff.
pigworker commented 4 years ago

It's now a pull request. But I'm sure there are still things to fix.

pigworker commented 4 years ago

Alea iacta est.