masak / alma

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage
Artistic License 2.0
137 stars 15 forks source link

Relationship of P6 and/or 007 macros to FEXPRs #302

Closed raiph closed 6 years ago

raiph commented 6 years ago

I'm curious to read an explanation of fundamental technical differences or similarities between Perl 6 macros (and, separately if appropriate, 007 macros) with Lisp FEXPRs. I thought ven especially might be able to shed light on the comparison.

(If this post, and ones like it, ought not be an issue in this repo, but rather, say, an SO or reddit post, and would still get your attention in this other location, please just close it with a comment explaining your preferences about where to post it and I'll cut/paste to that other location. I'm especially thinking it might be appropriate to start an SO tag for 007, perhaps initially justified as a tag to be used in conjunction with [perl6] but which would then immediately be available for use on its own anyway.)


The "Early Lisp macros" section of the wikipedia Macros page starts (apparently incorrectly) with:

Before Lisp had macros, it had so-called FEXPRs, function-like operators whose inputs were not the values computed by the arguments but rather the syntactic forms of the arguments

I say "(apparently incorrectly)" because the footnote at the end of the paragraph that begins as above links to an email that says:

The arguments to a FEXPR were definitely not the syntactic forms, but were instead the literal list structure of the source code as returned by read.

This suggests FEXPRs were, perhaps, very loosely akin to source filters in P5.

But, putting that aside, even if the wikipedia text is fundamentally wrong, and/or even if P6 and/or 007 macros are nothing whatsoever like FEXPRs, the first sentence in full sounds exactly like P6 macros to me, if one substitutes "abstract syntax tree forms of the arguments" for "syntactic forms of the arguments", and "compilation" for "computation":

Before Lisp had macros, it had so-called FEXPRs, function-like operators whose inputs were not the values computed by the arguments but rather the syntactic forms of the arguments, and whose output were values to be used in the computation.

Would you agree? In other words, is this accurate:

P6 / 007 macros are function-like operators whose inputs are not the values computed by the arguments but rather the abstract syntactic tree forms of the arguments, and whose outputs are values to be spliced into the compilation output.

The paragraph continues:

In other words, FEXPRs were implemented at the same level as EVAL, and provided a window into the meta-evaluation layer.

Does this map well to how P6 macros (and/or 007 macros) work?

The paragraph ends with:

This was generally found to be a difficult model to reason about effectively.

This difficulty, which clearly did indeed historically occur, leading to lisp macros, was perhaps due to the actual nature of FEXPRs (which I'll speculate made FEXPR macros akin to source filters) rather than the above apparently incorrect description of FEXPRs, which, as I have said, was rejected in the email that the wikipedia page footnotes and which sounds to me reminiscent of Perl 6 macros.

To summarize, I'm interested to see what comes out of y'all reflecting on the nature of P6 macros, and/or 007 macros, as they might relate to FEXPRs, especially as described in the quoted email. If they're fundamentally different, what is the nature of that difference? If there are similarities, why do you think P6/007 macros will avoid the reasoning difficulty?

masak commented 6 years ago

I don't have the expertise to comment on FEXPRs, unfortunately. Especially since there seems to be some genuine debate about their true nature.

What I can say though is that from my vantage point there is one big difference between anything Lisp did in the early days and what we're doing in 007 today (and ultimately Perl 6): whether you're in a quasiquote or not, a variable that's used has to be declared. (And since this is a static relationship, the program won't compile if it doesn't hold.) This is so fundamental to the language that it ends up also permeating how macros work, leading eventually to macro hygiene.

If I understand correctly, Scheme's view on macros is quite close to this. Scheme has a flavor of lexical hygiene. I haven't studied it in debt, though I'm planning to.

Yesternight I wrote this gist, which is something I've been planning to write for a while. I meant to write it long ago, but found it hard going until inspiration struck yesterday. It's really related to #212 and figuring that one out in order to make #207 work. But I think it's also relevant to this issue, or at least to this comment. :smile:

vendethiel commented 6 years ago

Note: I got a lot about FEXPRs from this dissertation

This suggests FEXPRs were, perhaps, very loosely akin to source filters in P5.

I've seen the term "runtime macros" used, and I think I like that term for FEXPRs. They are more similar to runtime macros, and I'll come back to that. The difference the email makes between "syntactic forms" and "literal list structure" is only interesting for a meta-circular interpreter, from what I'm reading in the email. Some other, related email linked to a paper called "Du déchiffrage des formules logico-mathématiques par la machine même dans la conception du programme", which is also an interesting read (...if you speak french. The abstract is available in 3 or 4 languages. It contains the first mentions of a "library" as we understand it today in programming).

A macro is a "source->source" transform. That's where they're similar to source filters. FEXPRs are "source->?": they are very much allowed to do whatever they want to do, at runtime. The email mentions that they were really confusing, and the only cases that were legible enough are possible via macros.

The arguments to a FEXPR were definitely not the syntactic forms, but were instead the literal list structure of the source code as returned by read. The difference is subtle but critical. It hinges on the implementation details of LISP I.

The only meta-circular interpreter I have looked at deeply enough would be Rubinius, which... isn't homoiconic, so obviously has huge differences between "source code read" and "syntactic forms".

If I were to draw a comparison, FEXPRs are like if you kept the "unevaluated arguments" parts of macros, but instead of being replaced by a quasi at compile-time, they instead sit there until runtime, where they get evaluated (often containing eval).

# FEXPR style
fexpr $if(cond, t, f) {
  my @fn = f, t # reverse arg, 0 = false, 1 = true
  eval(@fn[!!eval cond]);
}
$if(1, print("true"), print("false"));

Here, no transformation is done at compile-time. It's the FEXPR's job to eval its argument. Compare to a macro:

macro if(cond, f, t) {
  quasi {
    my @fn = {; {{{f}}} }, {; {{{t}}} };
    @fn[!!{{{cond}}}]();
  };
}
if(1, print("true"), print("false"));

which is transformed, at compile-time, to:

[{; print("false") }, {; print("true") }][!!1]();

(as a side-note: 007 used to have eval, called melt, but it doesn't anymore)


About @masak 's gist:

The only viable solution I've come up with is this: the Qnode corresponding to the quasi block needs to carry information about its outer pointer. It then gets returned from the macro to the thing in the parser that handles injection into the mainline, and this thing can then do the outer pointer dislodging as it needs to.

What is the "outer pointer" here? The macro context itself?

masak commented 6 years ago

Here's the Calculatrices digitales du déchiffrage. It's from 1954, and describes the first metacircular compiler, a few years before Lisp.

masak commented 6 years ago

melt was not meant to be eval, it was meant to turn ASTs encoding "fixed" values into those values. See #61.

I think we might get something like melt back. We just need to be more careful about whitelisting really constant/literal things. (Update: see #432.)

masak commented 6 years ago

@vendethiel:

What is the "outer pointer" here? The macro context itself?

Yes. In the quasi in the format macro, for example, the replaceAll function is referenced. The quasi's outer pointer is (the frame of this invocation of) the macro itself.

But the quasi is not what eventually runs. The injected code in the mainline (the "injectile") is what runs. It looks like the quasi, except unquotes have been filled in with Qtree fragments. The injectile's outer pointer is still the macro, though, just like with the quasi. Since the injectile is somewhere else, its outer pointer is "displaced".

masak commented 6 years ago

...huh.

As I look at the format macro so soon after mentioning melt, I notice there is something quite dissatisfying about it. But it's also a bit unrelated to this issue, so I'm going to open up a new one about it. :smile:

vendethiel commented 6 years ago

melt was not meant to be eval

I know it wasn’t, but it really is quite similar in how it works.

masak commented 6 years ago

@raiph I created a stack-overflow label like you suggested.

Close this one? Or are we still waiting for some resolution wrt FEXPRs?

masak commented 6 years ago

This thread seems to have stalled, and there is no Definition of Done. Closing.

masak commented 5 years ago

Sneaking this link in here: http://axisofeval.blogspot.com/2012/03/should-i-really-take-time-to-learn.html

masak commented 5 years ago

@raiph Commenting in a closed thread, but (as I'm learning more about fexprs) I just found this quote:

The idea of first-class operative combiners, i.e., first-class combiners whose operands are never evaluated, has been around a long time. Such creatures were supported by mainstream Lisps through the 1970s, under the traditional name fexprs, but they made a mess out of the language semantics because they were non-orthogonal to the ordinary variety of procedures constructed via lambda — and, more insidiously, because at that time the mainstream Lisps were dynamically scoped (a language feature that causes more problems for fexprs than it does for the less powerful [macros](https://en.wikipedia.org/wiki/Macro(computerscience))).

(And so it's kinda implied that we can do better nowadays, with modern tools and lexical scope.)

I found Kernel because allegedly its specification is very good. I haven't read it yet.

masak commented 5 years ago

@vendethiel This is my current take on fexprs (still not having read the dissertation):

There are three "phases of matter", three forms of computation that interest us:

  1. Code-as-structure (AST)
  2. Code-as-callable-value (first-class functions/lambdas but also code blocks)
  3. Code that's already running

A compiler basically takes us 1->2, and a runtime does 2->3. What Common Lisp, Scheme and 007 is keep us in 1 for as long as we need. What fexprs do is allow us to take fragments of 1 all the way into 3 and only then "evaluate" them (1->2->3) at compile time.

(Oh! The dissertation is by the Kernel author. D'oh.)

vendethiel commented 5 years ago

only then "evaluate" them (1->2->3) at compile time.

Should it still be code "compile-time" then? :) I think this brings us back to meta-circularity (at least for a bit). C++ (i.e.) has heavy limitations on what can be done at constexpr time, and the mechanisms to implement this in the compilers often mimics an actual interpreter...


I now understand what it means to have something post interesting links faster than I can read them; serves me right... I'll try to read what you've sent me.

masak commented 5 years ago

Hey, no need to feel "behind" on reading. 😉 Did you see the new papers.md? That's an (incomplete) list of things I haven't read but probably should. Remember, the end goal of all of this is to bring decent macros to Perl 6.

raiph commented 4 years ago

Hi @masak,

My summary of my latest (but very tentative) understanding of fexprs and their relationship to Raku is that the macro declarator declares a compile-time fexpr, not a lisp macro:

https://www.reddit.com/r/rakulang/comments/g7xluv/question_about_macros/fow63tp/

I've drawn this (again, very tentative) conclusion based on reading Shutt's comments on fexpr's, mostly at LtU.

I note his commenting on how macros are more about filling in a template (which reminds me of stuff you've said about 007/alma macros), plus some sweeping simplifications that are possible related to hygiene and quasi quoting (which may at least trigger insights for you). Perhaps you have also already come to the same conclusions without realizing it's because you're implementing fexprs, not macros (in lisp parlance), presuming I'm at least somewhat right about that.

masak commented 4 years ago

Hi @raiph,

I'm sorry, no, I don't believe Perl 6 macros (or Alma macros) are fexprs at all. At the risk of revealing my still-very-shaky understanding of fexprs — the big distinction is that in Lisp macros (and Perl 6 macros, and Alma macros), after the macro has expanded at compile time, it's "spent". Only its expansion remains in the target code. A fexpr, on the other hand, inserts itself as the thing to be called, and so itself remains in the target code as the "expansion". That's very different, and according to that difference, Perl 6 does not have fexprs.

I'm sorry to knee-jerk reply like this without properly reading your reddit post. But I wanted to communicate my understanding and point out a possible confusion, even though my understanding is partial and the confusion is possibly also mine.

I promise to go back and read the reddit post (and the source material) more carefully.

raiph commented 4 years ago

after the macro has expanded at compile time, it's "spent".

I meant the same when I wrote "compile-time fexprs".

Aiui, an alma macro returns an AST. It's simply a value, one that happens to be an AST tree, that's then just directly spliced into the overall AST. Perhaps I'm wrong about that; perhaps what's returned from an alma macro is an expression that's then immediately evaluated, but I would be fairly shocked to hear that.

According to John Shutt, who oughtta know how this stuff works given how central it is to his doctoral dissertation that:

when an ordinary function produces its output, that output becomes the result of the call to the function. The same for a fexpr. But the output of the macro is an expression that is then evaluated

when you're trying to make sure your code will do what you really want it to do, it's easier to do so by actually specifying what you want done (the fexpr strategy), than by specifying how to build an expression that will specify what you want done (the macro strategy).

masak commented 4 years ago

@raiph I still believe what you are saying does not match what we are seeing in Alma and Perl 6. I say this as the implementor of macro systems in both languages, although I also don't doubt the credentials of John Shutt.

Aiui, an alma macro returns an AST. It's simply a value, one that happens to be an AST tree, that's then just directly spliced into the overall AST.

You are correct, as far as that goes.

But — since the format of comments allows me to interrupt you to ask a question — are you aware that this is exactly what Common Lisp and Scheme also do? And many many other Lisp dialects. And other languages, too. Including Perl 6.

To spell it out, just to make super-sure that we are on the same page:

# Perl 6 or Alma
macro sayHi() {
    quasi {
        say("hi");
    };
}
;; Bel
(mac prhi ()
  `(pr "hi"))

These two examples are as identical as the different syntax allows them. The little backquote in the Bel code corresponds to the quasi keyword in Perl 6.

Both of these are macro mechanisms. Neither of these is a fexpr mechanism. Talking about "compile-time fexprs" further confuses the issue when "macro" already correctly describes the thing. It's a bit like saying: "no no, I don't mean food, I'm talking about more like a kind of solid drink".

it's easier to do so by actually specifying what you want done (the fexpr strategy), than by specifying how to build an expression that will specify what you want done (the macro strategy).

Instead of trying to pinpoint and explain exactly what Mr Shutt means when he's writing this, let me show you what a Perl 6 or Alma macro would look like if it were a fexpr:

# imagined Perl 6 or Alma with fexprs
macro sayHi() {
    say("hi");
}

As you can see, the big difference is the lack of quasi. The macro is kind of "auto-quasi'd", and the expression being returned from the fexpr is the code itself in it.

Perl 6 and Alma do not work this way. Kernel and picolisp do, as far as I understand.

masak commented 4 years ago

Oh!

As I started reading the reddit post, it pretty quickly became clear to me where this confusion comes from.

Quoting the relevant part:

  • Macros look similar but they take (a list of) unevaluated things ("operands") and return a replacement in the form of (a list of) unevaluated things that are then immediately evaluated as if they were at the original macro call site.

  • Fexprs take (a list of) unevaluated things ("operands") and return a replacement in the form of (a list of) unevaluated things that are left as is, i.e. not immediately evaluated.

You are right, the macros in Perl 6 and Alma are not immediately evaluated. But...

...this is an artifact of a deeper language difference, more like a philosophical difference between "compile-then-run languages", and Lisp. (I've been returning to this point in various ways over the years when talking with @vendethiel. Only recently, like last year or so, do I grok this difference.)

Perl 5, Perl 6, Alma, C, C++, Java, C# and many others belong to the first group. Various Lisps belong to the second group. Python has some aspects of the second group — in that both class and function definitions "happen at runtime", but in the end it also belongs to the first group — in that compilation completes for a whole unit before the first statement ever runs.

In Alma and Perl 6, things are not immediately evaluated out of macros, because Alma and Perl 6 are not languages of this second type.

In languages of the second type, the macro tends to be "expanded at runtime", which is OK because compile time and runtime are so close together. That's why these languages talk about the macro returning something which is then immediately evaluated. Alma and Perl 6 not being of this second type, their macros don't — can't — work like this. More generally, in any language of the first type, macros don't work like this.

As far as I can tell, what the Raku design is calling macros are actually a limited form of compile-time fexprs such that their result is spliced back into the AST. So they have neither the complexity of lisp macros (no tongs) nor the complexity/downsides of lisp fexprs (no static analysis / optimization hit).

But because fexpr is a horrible name, the declarator is macro.

I respectfully disagree with this conclusion. They are macros; nothing to do with fexprs.

masak commented 3 years ago

Something, I don't quite remember what, prompted me to start reading John Shutt's dissertation about fexprs and vau calculus. And... wow.

Don't get me wrong, it's not that I'm now suddenly a raving fexpr convert. I'm not. (For example, I don't want to uproot everything we've done with Alma and start over, just to do it "right" with fexprs. In fact, I think the end result would maybe look more like Kernel and less like macros for Raku, which wouldn't be in line with 007's or Alma's mission statement.)

No, the reason I say "wow" is that it's almost ridiculous how much falls, for lack of a better metaphor, into place when I read this. There's so much to process and appreciate from the dissertation, and I'm not even through the entire text yet, but I still want to try and summarize it.

This will be a multi-comment kind of thing, by necessity. At some point when reading through the dissertation I needed to take a detour and read the Kernel Report. I heartily recommend doing so; more below.

Ping @raiph @vendethiel @jnthn.

masak commented 3 years ago

In medias res

Instead of building up to an exact description of my current understanding of Kernel's central abstraction — the operative — let's just define one, and talk about what falls out:

($define! $and?
   ($vau x e
      ($cond ((null? x)        #t)
             ((null? (cdr x))  (eval (car x) e)) ; tail context
             ((eval (car x) e) (apply (wrap $and?) (cdr x) e))
             (#t               #f))))

In textual order:

vendethiel commented 3 years ago
* I was going to say that no other languages offer both like this

Is it cheating to say & and &&? :)

* The `wrap` built-in wraps our `$and?` operative into an applicative so that we can call `apply` on it.

Is it special? Shouldn't it also have a dollar sign otherwise?

wraps our $and? operative into an applicative

So we could've used and? instead of (wrap $and?), right?

* But even wrapped, it doesn't lose its fexpr-ness

I'm interested in how.

masak commented 3 years ago

I was going to say that no other languages offer both like this

Is it cheating to say & and &&? :)

Yes and no, I think. :wink: It kind of depends on what you think qualifies as a boolean. By Kernel's/Shutt's standards, & doesn't qualify because it operates on unsigned integers/bit vectors. By C's standards, it's a perfectly cromulent "eager and" operation. I mean to talk a bit more about Kernel's strict treatment of booleans, which was a surprise to me.

The wrap built-in wraps our $and? operative into an applicative so that we can call apply on it.

Is it special? Shouldn't it also have a dollar sign otherwise?

Ah, trust @vendethiel to sniff out the one part of what I wrote where I felt I didn't have enough to stand on to be sure. But this is important, so let's try.

I think the answer to your question is in the metacircular evaluator (here using the one from SINK, Shutt's old prototype):

(define combine
  (lambda (combiner operand-tree env context)
    (cond ((operative? combiner)
             (operate combiner operand-tree env context))
          ((applicative? combiner)
             (combine (unwrap combiner)
                      (map-eval operand-tree env context combiner)
                      env
                      context))
          (else
             (error-pass (make-error-descriptor
                           (list "Tried to call a non-combiner: "
                                 (list combiner)))
                         context)))))

Note crucially that map-eval is called on the operand tree of an applicative, but not that of an operative. That is, during "normal" evaluation, the operands of an applicative (but not an operative) are subjected to evaluation before calling the underlying combiner. This should be non-surprising, as it parallels the difference between macros and procedures, or between special forms and procedures.

But when we call apply in the $and? code, we circumvent this part, and there is no evaluation of the operands:

($define! apply
   ($lambda (appv arg . opt)
      (eval (cons (unwrap appv) arg)
            ($if (null? opt)
                 (make-environment)
                 (car opt)))))

Maybe a different way to state this is that $and? is certainly special, and nothing of that is lost. If we bound the result of wrapping $and? to some name, it wouldn't deserve a dollar sign, because evaluating it in the normal way makes sure to also evaluate its arguments first.

wraps our $and? operative into an applicative

So we could've used and? instead of (wrap $and?), right?

But even wrapped, it doesn't lose its fexpr-ness

I'm interested in how.

These two questions are answered above, I believe, but I'm not sure how clearly.

In summary: when we call apply, we circumvent the normal evaluation of arguments that would have made a normal applicative call unsuitable for a (wrapped) operative.

and? and (wrap $and?) are not the same; there are two orthogonal changes going on here at the same time. Think of wrap (and unwrap) as the constructor (and destructur-or, respectively) of applicatives. We're used to thinking of applicatives ("normal functions") as being fundamental in some sense, but in Kernel, they are not — they have been factored into "evalutation of the arguments" + "operative", the latter being the fundamental concept. The "evaluation of the arguments" is the second change that's going on, but with apply we can circumvent it.

I realize this is bewildering; it's still a bit messy to describe. It gets even weirder when one realizes that $lambda in Kernel is defined as basically wrap of $vau — so where's that evaluation of the arguments, then? Well, the answer is that it happens when we evaluate the resulting lambda value as the head of a combination (form), in the normal course of a program.

I hope to get back to this, and be able to state it in a more straightforward way. In the meantime, hope this helps.

masak commented 3 years ago

Self-replying because I'm thinking about what I wrote and I'm kind of understanding it a bit deeper.

Combine this:

The e parameter [to $vau] for the environment, and the explicit environment passing in general, are central here.

With this:

[...] environments are conjugate with evaluation.

(And with the rather abstract things I attempted to clarify to @vendethiel.)

What you get is this: a $vau is like a lambda, but it has been provided with "the means of evaluation", to use at its discretion. The (dynamic) environment passed to it represents more of an opportunity than a mandate — interestingly, different library operatives have different policies about the environment they choose for evaluation.

Contrast this with a normal macro in the style of Alma (or Raku), where the focus is not on the evaluation more than perhaps indirectly; by passing things through from macro parameters to unquotes, you allow an invisible third party (the compiler + runtime) to evaluate things later. The focus is on the (quasiquoted) program text/structure/syntax, not on its semantics.

The fexpr paper calls the former style the explicit-evaluation paradigm, and the latter the implicit-evaluation paradigm. Macros have largely been heavily associated with the latter, in Lisp/Scheme and elsewhere. Apparently some experimental languages called reflective Lisps (of which I know next to nothing) from the 1980s share with Kernel their focus on explicit evaluation.

Quoting the fexpr dissertation:

Fexprs have an inherent clarity advantage over macros.

In part, this is because a fexpr specifies its entire computation explicitly, whereas a macro specifies only the front (and usually lesser) half of its computation, arranging the back half indirectly. Naturally, an algorithm is easier to understand when it is actually stated.

masak commented 3 years ago

Kernel and the lack of a static phase

I was going to establish the terms "1-phase languages" and "2-phase languages" (analogously to Lisp-1 and Lisp-2), but now I fear if I say "two-face" too much, I'll only think of Harvey Dent.

Anyway. There are languages that just go ahead and run your program — but then there are also languages that first spend some time, well... type-checking, optimizing, and code-generating. You know, compiler-y stuff. Technically, this could be a difference between implementations, but usually it goes deep enough to be a feature of the language itself.

We could refer to these as "dynamic lanaguages" and "static languages". We could also talk about "interpreted languages" and "compiled languages". I think what we have here are three axes, but they're almost aligned.

Kernel has no compile phase. It's pretty into lexical lookup, which is usually compilation-friendly, but there's also something about operatives/fexprs that ruins all that static knowledge. (Shutt is aware of this. But it might not be an "all hope is lost" situation. The Kernel specification talks hopefully about a sufficiently smart "optimizing interpreter".)

With all this in mind, it's kind of interesting that a $vau ends up dealing with two environments, referred to as "the static environment" and "the dynamic environment". The former being the active environment in which the $vau combination is evaluated; the latter is that e parameter that comes in, later to be bound to the environment where the resulting operative is invoked.

Alma and Raku are two-phase languages. But the distinction between "static environment" and "dynamic environment" here is pretty much the same; the former belongs to the macro body, and the latter to the mainline where the macro is expanded. Biggest difference is that the "static environment" also happens in the "static phase" (BEGIN time), of which there is one.

A bit earlier I wrote

(For example, I don't want to uproot everything we've done with Alma and start over, just to do it "right" with fexprs. In fact, I think the end result would maybe look more like Kernel and less like macros for Raku, which wouldn't be in line with 007's or Alma's mission statement.)

This distinction is mostly what I meant. Neither Raku or Alma would be inclined at all to lose its compilation stage, not even for the pleasing theoretical properties of operatives.

masak commented 3 years ago

I want to link to an LTU comment by John Shutt, that explains the origins of Kernel.

Worth reading in its entirety, but here's the central paragraph:

Notice that the key innovation here is not reduction of the usual processing of an ordinary Scheme procedure call to special-procedure calls that don't evaluate their operands; not, somehow, "elimination" of operand-evaluations; but simply factorization of an ordinary Scheme procedure call into two stages: first, evaluate the operands to produce arguments, and second, apply the underlying special-procedure to the argument list. All the things done before are still done, we're just keeping track of them separately.

The factorization is the one where $lambda consists of a $vau and a wrap, as mentioned earlier.

masak commented 3 years ago

With the knowledge of Kernel as an intrinsically 1-phase language, I can now amend my reply to @raiph a little:

I'm sorry, no, I don't believe Perl 6 macros (or Alma macros) are fexprs at all. At the risk of revealing my still-very-shaky understanding of fexprs — the big distinction is that in Lisp macros (and Perl 6 macros, and Alma macros), after the macro has expanded at compile time, it's "spent". Only its expansion remains in the target code. A fexpr, on the other hand, inserts itself as the thing to be called, and so itself remains in the target code as the "expansion". That's very different, and according to that difference, Perl 6 does not have fexprs.

A fexpr/operative in Kernel doesn't much insert itself in any way. Very similar to a function, it just gets invoked. There's no expansion going on. But the interpreter knows the difference between operatives and applicatives, and only evaluates the operands for the latter type.

There's no way for Kernel operatives to be "spent" at compile time, because there's no compile time.

masak commented 3 years ago

Flying first class

Part 3 of N in a series.

Kernel has theoretical axes to grind, and it grinds them with style. The most obvious one is Christopher Strachey's notion of "first-class citizen"... the idea that if something is a manipulable entity in a language, then that entity should appear as a "normal value" in the language, with no artificial restrictions on its use.

Strachey was talking about functions, for which there has been an interesting journey to first-class-hood, culminating in, well, Scheme.

Kernel starts out by first-classing "special forms" (by turning them into operatives), which already totally hoses static analysis, but it can't stop there. The slogan is that "all manipulative entities should be first-class", and so it goes on to first-class environments, hosing static analysis again for emphasis.

But I want to write about the third thing Kernel makes first-class: cyclic lists. Yes, you read that right: lists in which following cdr references ends you up in a loop, are first-class in Kernel.

What does that even mean? It means the definition of the length applicative can't be this simple "add 1 until there are no more cdrs" implementation:

($define length
   ($let ()
      ($define aux
         ($lambda (object acc)
        ($if (pair? object)
             (aux (cdr object) (+ 1 acc))  ; tail call
         acc)))
      ($lambda (object)
         (aux object 0))))

Doing it this way would be perfectly fine in most Lisps and Schemes. In Kernel, it counts as some kind of dereliction of duty, because the code above diverges on a cyclic list, which breaks the promise of making cyclic lists first-class.

For reference, here's Kernel's implementation of length:

($define! length
   ($lambda (object)
      ($let (((#ignore #ignore a c) (get-list-metrics object)))
         ($if (>? c 0)
              #e+infinity
              a))))

From this, you can easily see that length basically delegates to get-list-metrics, another library function. I'm not going to reproduce its definition here; suffice it to say that it can handle cyclic lists.

A first reading of the Kernel-minus-1 report might give one the impression — it gave me that impression — that Shutt is obsessed with cyclic lists. But I kind of came around to the notion that cyclic lists are not an unfixable problem, and that the only thing you need to support them is to give up the youthful notion that you can always recurse yourself to happiness because lists always end in a nil. That was never true, and Kernel just makes sure to disabuse you of that notion from the start.

Here are the three ways Kernel handles cyclic lists:

  1. Treat it as an error (append, list, list*)
  2. Step through it left-to-right and deliberately get stuck in the cycle ($and?, $cond, $or?, $sequence)
  3. Handle it in arbitrary order and in finite time (and?, append, assoc, assq, filter, for-each, get-list-metrics, length, list-neighbors, map, member?, memq?, or?, reduce)

Category 3 is where Kernel does better than your average Scheme, I guess. But to me, category 2 is the fascinating one, for this simple reason: all of those combinators (coincidentally all operatives) can optionally act as potentially unbounded loops, and that's like... that's like exactly the payoff one might hope to get from making something first-class.

Which leads on to one last point. Programs go beyond syntax. Cyclic lists have no syntax representation, but they can easily be created at runtime. Which, super-ironically, brings us full circle: by being a 1-phase language, Kernel seems to cause cyclic lists to be eligible for first-classness. Because syntax isn't, and never was, where it is; syntax is just a flat representation of a what the program looks like before you start running it.

masak commented 3 years ago

Which leads on to one last point. Programs go beyond syntax.

Or, put it this way, to bring it closer to home: macros transform code to code, but it's a mistake to believe that the code that macros get as input is always based on source text. That's likely true about the first macro that expands in the program, but... after that, all bets are off. Macros act on generalized ASTs, which, like it or not, are less constrained than the syntax that we usually use to encode them.

masak commented 3 years ago

[...] the youthful notion that you can always recurse yourself to happiness because lists always end in a nil.

For clarity: Lisp's original idea with induction and cons lists at the foundation of everything has been wildly successful. It's an idea whose time has come and gone, yes, but nevertheless a very central and important one.

The EOPL book arrived in the mail today. (Yay!) On page 12 we find this:

The Smaller-Subproblem Principle

If we can reduce a problem to a smaller subproblem, we can call the procedure that solves the problem to solve the subproblem.

If cons lists have two parts, one about the data structure, and one about the behavior/destructuring/induction, then this principle is all about the latter. And it also says something about how and why structural induction works so well. Cons lists are fundamental in the sense that they are the simplest imaginable inductive data structure. (Natural numbers don't count; they are an inductive type, but they are not a data structure; holding copies of () doesn't count.)

Kernel holds up cyclic lists as one of the ways induction isn't always the answer to all your problems: what if your subproblem isn't smaller? What if your data structure isn't actually inductive and doesn't have a nil leaf at the end? Somehow saying "well, don't do that, then!" sounds more like a meek excuse to cling to an oversimplified model of computation, and less like a cutting retort to someone venturing outside its intended boundaries.

The three cases above arise for these three (perfectly valid) reasons:

  1. Trying to recurse in the normal way would diverge and never lead to a result, so we decided to tell you that up-front.
  2. Congratulations! Your chosen model, induction, actually was correct! Please step right into this infinite loop, guaranteed not to blow your stack.
  3. Trying to recurse in the normal way would diverge, but we know a better way than the Smaller-Problem Principle to get the answer you wanted. Sorry!

Steele's talk from 2009 about conc lists (sic) is another objection to normal cons lists — namely, that they induce a "head-heavy" view of the world where getting something from the front of the list is immediate, but getting something from further down the list is increasingly costly. His alternative data structure is almost as fundamental, better suited for parallelization, and (sadly, ironically) a fair bit harder to reason about. Ideas very similar to that made it into Java's parallel streams.

masak commented 3 years ago

induction isn't always the answer to all your problems: what if your subproblem isn't smaller? What if your data structure isn't actually inductive and doesn't have a nil leaf at the end?

Suddenly I remembered the following AI koan, which I must have read long ago in my impressionable youth:

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:

One day a student came to Moon and said: "I understand how to make a better garbage collector...

That is, in order:

  1. This joke is not so funny. Maybe koans aren't always primarily funny.
  2. Just to be clear, it has an infinitely deep nesting structure. In other words, it literally has no punchline.
  3. All this is to say that the joke cyclically repeats itself to prove a point about the bad interaction between refcounting and reference cycles.

The reference implementation of one of our three major languages today (cpython and Python, respectively) does actually do refcounting, even in the face of cycles. Depending on your disposition, that either makes the joke a bit funnier, or moot.

masak commented 3 years ago

Making things first-class hurts

Part 4 out of N in a series.

Just as a teaser, here's what I hope to write about in later parts, phrased as catchy slogans:

(Later edit: The following list has subjects that I added later.)

But for today, the relatively simple topic of "making things first-class turns your language into a nightmare".

I might not do the topic justice. But I will try, just by saying words, and hoping they land well.

Making things second-class plays well into the "static world view" that seems to be our instinctual, natural way to think about languages and source code. Take Strachey's original case, functions. Functions started out as decidedly second-class:

Let's skip past the (fun) details of how functions reached first-class-hood — upwards funargs, downwards funargs, Knuth storing the return address in the machine code of the function itself in MIX but not in the successor MMIX — and just say that functions are first-class today, in modern languages... but that citizenship also makes them much more complex and intricate. Sometimes but not always, functions can be compiled down to their simple, second-class forms. Often enough, that's not an option. Crucially, when it's not an option, there's both a loss of optimization opportunities, and a loss of static knowledge in general. Static types may mitigate but not completely alleviate the problem.

Functions are a good example, because their road to citizenship (cheered on by the "lambda the ultimate" crowd) is quite well-documented and basically a story with a happy ending. They are a bad example, because functions as a thing are unusually well-encapsulated; basically functions are their own abstractions, and they don't leak an awful lot. (When closures do leak, closed-over captured variables might have to be promoted from the stack to the heap, or, as is the case in Java, severely mistreated.) It just struck me recently that the whole (necessary) distinction between (static) function "declaration" and (runtime) function "value"/closure is due to functions reaching first-class-hood. WIthout that, the distinction is not necessary.

Making other things first-class is usually worse, because they are less well-encapsulated, and the corresponding damage on the rest of the language is all the bigger.

Let's turn to some examples:

There's also something slightly interesting going on with making types and classes first-class, which Raku actually tackles fairly head-on. Things like a MOP and a compile-run phase distinction get thrown into that mix. I can't even begin to claim to have the vocabulary to address all that fairly and meaningfully. Just to say that it's interesting, non-trivial, full of dragons and dread, and maybe not as big of a mistake as some of the others listed above.

masak commented 3 years ago

Just randomly firing off bullets of Kernel summary in the hope of hitting some understanding:

"Special forms" (like if or define) are special because they require access to the interpreter which is afforded neither to traditional functions nor to traditional macros. (With if, the issue is more of a "conditional metacircularity"; with define, it's direct write access to environments.) The traditional solution in Scheme is to hard-code these special forms in the interpreter. They therefore form a closed set.

Kernel provides $if as a primitive, because the conditional metacircularity is pretty real. But it defines most other operatives in terms of more primitive ones, and it makes no special cases in the interpreter for things like special forms. The way it does this is to provide operatives with (a) the unevaluated operands of the form, (b) the dynamic environment the form was evaluated in, and (c) the means to evaluate operands (or synthesized code) in any environment. (Operatives end up being able to stand in not just for special forms, but also for macros, uniting the two under a single feature.)

Putting no special cases in the interpreter for special forms is clearly cleaner in some undeniable sense. Making $if a primitive means the "special form"-ness still lives on somewhere, somewhat like a shell game. But even so, the total surface area of special forms feels smaller. And the new opportunities offered by Kernel's operatives seem... open-ended. The jury is still out on whether they do net good or net damage.

masak commented 3 years ago

Steele's talk from 2009 about conc lists (sic) is another objection to normal cons lists — namely, that they induce a "head-heavy" view of the world where getting something from the front of the list is immediate, but getting something from further down the list is increasingly costly. His alternative data structure is almost as fundamental, better suited for parallelization, and (sadly, ironically) a fair bit harder to reason about. Ideas very similar to that made it into Java's parallel streams.

Interestingly, I think Shutt and Steele have quite a similar approach to the problem (of cons lists being limited in some way); they just end up identifying two different, largely orthogonal sub-problems to solve, and therefore arrive at different improvements. Shutt keeps the general structure of cons lists, but develops a kind of theory of how to relate to "the cycle at the end"; Steele ditches the entire original algebra on the grounds that it asymmetrically favors the head of the list, and replaces it by balanced trees — the regular two-case nil-or-cons reduction/recursion instead turns into a three-case empty-singleton-or-conc destructure.

Both Steele and Shutt get great mileage out of "an algebraic approach to code": namely, to express nice properties that you want to preserve as equivalences whenever possible. Here is a talk where Steele says in the beginning that he enjoys doing it that way, because he has a background in algebra.

masak commented 3 years ago

The loss of the compile time phase

Part 5 of N in a series.

In Notes from the Mystery Machine Bus, blogger and language omnivore Steve Yegge outlines how software engineering is filled with two kinds of people. Not borrowing his terminology (which is needlessly tied to politics), the two types are static people and dynamic people.

Or maybe you might call them compiler people and interpreter people. Or maybe "languages" people and "systems" people. There are many names. Early-binders and later-binders might work as well.

Hacker News never seemed to like that particular blog post of Yegge's, but... it's stayed with me. I believe the distinction exists, and is important. I also believe that the reason the decades-long "debate" between these two extremes has been so enduring and sometimes frustrating is that both sides are right, or at least worth paying attention to. You want all the flexibility and all the static safety? Great! What do you mean those are fundamentally incompatible?

Now, take types. One reason it's hard to talk about types is that it's not even a completely well-defined term. Raku taught me the useful intellectual habit of "tabooing" a word that's too overloaded for its own good, like "parent" of an active function. Are you talking about the "outer" or about the "caller"?

Similarly, I think "type" can be tabooed for the scope of the subsequent discussion, in favor of three concepts:

Kernel eschews static types, with the explanation that while these sometimes help catch errors, they also forbid situations which are completely fine, just not provably so. I am sympathetic to this view, although I also feel that there is something about that stance which feels... different from many other related judgements in Kernel.

I feel there is something here that I can't quite express well, where static types don't really work in a language like Kernel, where there is no compile-time, no guarantee of stable bindings, and no general guarantee that code can be optimized/compiled early. In light of that, the above justification for avoiding static types feels distinct from the forces that already apply on behalf of the language itself and its design choices.

I want to end this comment by quoting a blog post from Shutt's blog where he talks about the decades-long static-dynamic confusion:

I'm a believer in trying more rather than less. I don't begrudge anyone their opportunity to follow the design path that speaks to them; but not all those paths speak to me. Second-class syntax doesn't speak to me, nor recasting Lisp into a compiled language. I'm interested in compiling Lisp, but want the language design to direct those efforts rather than the other way around. To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see. To do that, it seems to me we have to recognize that it is a distinct path, the distinct philosophy of interpretation; and, in company with that, we need to hone our mathematical technology for interpreted languages.

masak commented 3 years ago

John Shutt has since gone on to enrich and evolve his skepticism of static types into something that looks almost like a philosophy. He likes to talk about "sapience vs technology":

[...] for my part, I'm type-skeptical. I don't recall (perhaps it's slipped my mind?) attempting to explain on LtU my beef with types.

(1) These sorts of type systems have to support all correct formal reasoning that can be done. Gödel's theorems aren't friendly to that. I've wondered for a couple of decades or so, if the root of this is some sort of mis-step in setting up the problem. Viciously circular logic is an existential crisis, whereas the equivalent computation is just a program that's hung up in an infinite loop (it happens); is there a way, by embedding logical reasoning within the computational system (rather than vice versa), to bleed off the antinomies into mere non-halting computation? It's apparently impossible to prove (formally) this can't be done, because it involves stepping outside the rules; yet there's also no knowing that it can be done except to figure out how.

(2) Typing and computation put conflicting demands on programming-language design. Typing is formal proof about static expressions. Computation is what those static expressions actually do when evaluated. When you design a language to support typing, you're trying to make clear how the formal proof works. When you design a language to support computation, though, you're trying —as your first line of defense against getting the program wrong, before you even begin to reason formally about it— to make clear what the program does. If you can't tell, by looking at the program, what it does, that's a non-starter. And I suggest you cannot successfully design for both at once. Designing to clarify the formal reasoning obfuscates the algorithm; designing to clarify the algorithm obfuscates the formal reasoning.

I think the strongest I feel in terms of sympathy for Shutt's views here is when (as often happens) something starts out as a "best practice" or useful rule of some sort, then the rule gets encoded in a software tool, and then somewhere along the line the original purpose of the rule is mislaid, and the rule is applied in ways it was never meant to, causing harm instead of good. Static typing has a little bit of that failure mode: they start out as good intentions, and then somewhere along the way, they become a hurdle to overcome or a hungry genie to placate instead of something that brings net productivity to development.

Also, to point out the maybe obvious, the "static" in "static types" is simply not very compatible with the inherently dynamic interpreted mindset. If by definition "everything flows", then either your static types are going to seriously get in the way of that or they are simply not going to work as advertised.

Same sort of sentiment expressed in another LtU thread:

[...] the mismatch between sapient human minds and non-sapient formal structures. Mathematics is (as I've remarked in another thread) a conversation between human minds, and its abstractions are sapience-shaped; programming abstractions break down because they are, in essence, attempting to "explain" abstractions to the non-sapient computer. Natural language is —not going out on a limb, here— also a conversation between human minds; hence formal grammars break down because they are trying to contain sapient dialog within a rigid structure. Sapience doesn't fit into the confines of a formal system (I've blogged relating this to Gödel's Theorems), so both programming abstractions and formal grammars "leak".

masak commented 3 years ago

Booleans

Part 6 of N in a series.

For a language bent on thrusting power into the hands of the developer, Kernel sure locks down booleans in an unexpected way. Under the heading of accident-avoidance, the language has absolutely no type-punning to booleans whatsoever.

I think once the initial jolt of surprise over that design decision wears off, I do feel it's the right decision. Yes, basically everyone does it, from C to Python to Bel. Java being a notable exception — if Kernel had had static types (let alone the compile-time in which to have them), it would have been identical to Java in this respect.

As someone who has had to fix two honest bugs in my Perl 5 code in the past half year rooted in the fact that the string "0" boolifies to false, I do buy the argument from accident-avoidance. Add to this the fact that all boolean-producing combiners in Kernel have a ? suffix by convention, and the restriction feels a little less like a burden and more like... a pattern, or a rhyme.

Still, it's odd to find this kind of caveat in a language with principles like "values should be first-class" and "you should be able to use basically anything, without restriction".

I'll let the Kernel Report explain that one:

The canonical illustration of the single-role principle is type boolean, whose essential purpose in the language is to serve as the result of a conditional test (as for operative $if, §4.5.2). Most Lisps, including R5RS Scheme, provide a boolean type for this purpose, but actually allow any object whatever to occur as a conditional test result, the evident purpose of this convention being that the programmer will omit the explicit predicate in conditional tests where the value to be predicated is #f exactly when the predicate would return false. This deliberate muddling of object roles in the base language leaves it vulnerable to the accidental use of an expression as a conditional test that was never intended to fill such a role, and cannot in fact ever produce a false result; in such an accidental use, an illegal act actually has been performed, but is not detected. Requiring a boolean result —which Kernel does— would presumably cause the error to be pinpointed the first time the code is run, rather than lying dormant until it must be laboriously hunted down when the code is discovered to misbehave; and (not forgetting the permissive half of the equation), the programmer is neither prevented nor even significantly inconvenienced from any activity since all it takes to guarantee a boolean result is that the programmer explicitly state his intention in the form of a predicate.

I think I agree with everything, it's just... for a language which then goes on to eschew static types completely because of the correct programs they might preclude, Kernel sure does seem to veer hard towards pinpointing errors early in this particular case.

Not even sure it's inconsistent or hypocritical, it's just... refreshing to see a language take booleans seriously.

Mentioned in passing: a bunch of other combiners could have returned values besides the side effect they carry out, but Kernel is very consistent in returning the special value #inert in those cases. This too, I respect.

masak commented 3 years ago

At a tangent, but related to a lot of what I've been getting at above: I just found the concept static binding defined in section 1.2 of this paper, "Static and Dynamic Type-Checking".

I believe this is a useful concept that helps shed light on the whole situation, including why it's so slippery to pin down. Kernel does not have static binding. Java definitely does; so do Raku and Alma. Some others (Python, Scheme, JavaScript) I'm not sure about, because these all seem to expose their environments in some way or another at runtime, confounding the issue.

(Edit: Oh, I should define "static binding" too, shouldn't I? Not just refer to the paper. Very well: if you can look at a variable use and say with certainty "it refers to the variable declared in this declaration" (for some specific previous declaration whose lexical scope the use is in), then the language has static binding.)

This ends up being important for macro expansion, because static binding is a property that needs to be preserved during macro expansion. This is the hidden driving force behind #410 (but also, as far as I understand, behind Scheme's alpha-renaming strategy during expansion of syntax-case macros; I say this while still not sure whether Scheme has static binding). If Java had a hygienic macro system, it would run into exactly the "fun" kind of issues Alma has; but for Python, things are much less clear-cut, and for Kernel, the very prerequisites for static binding are absent.

masak commented 3 years ago

The presence of a compile-time phase allows us to view the program as primarily tokens, like C when it expands its preprocessor #defines into code. This style of expansion ("polynomial substitution") is maybe not eternally cursed, but it does have its fairly well-understood pitfalls. The "Resugaring" paper talks about a reader+syntax subdivision as a bicameral syntax, and C's flavor of macro-expansion does not have that.

Scheme does have the bicameral thing, and its macros have to get through the reader before they can even be defined. But we're still on the syntactic side of things, which is why great care still has to be done to achieve hygiene. Macros deal in tree fragments, and tree fragments can definitely be handled in such a way as to ignore/dishonor static bindings. Scheme's massive alpha-renaming steps as part of its macro expansion could be seen as a desperate preventative measure, to make it impossible to do anything unhygienic. (I was going to say something about it being costly to do this way, too, but I'm on thin ice there. Maybe there are cheap ways.)

Kernel... well, you see, Kernel doesn't get into the mug's game of messing with syntax in the first place. The fexpr dissertation manages to discuss hygiene in terms of variable capture, but then quickly points out that due to the way fexprs work, variable capture is unlikely in practice to happen by accident. It spends some time explaining how not only $vau and fexprs help with that, but the entire design of Kernel is geared to help avoid such accidental capture. Specifically, neither (quasi-)quotation nor any other forms of "unevaluated symbols" (the recurring example given being else in Lisp's cond statements) are supplied in the language proper — on general principle, mind; there's nothing that would prevent them from being implemented.

The biggest difference I see between macros and fexprs at this point is this: macros manipulate code/ASTs/sexprs in some form or other, whereas fexprs are much more akin to functions in that they just run stuff (taking code/ASTs/exprs as parameters, yes, but seldom doing more with them than decide when to evaluate them). Hygiene, while possible in theory for the latter kind, is much more of an issue of the former due to its handling of "unevaluated symbols". Kernel gives up its static binding, but in doing so, it cuts much closer to a kind of dynamic binding that is not an option using purely syntactic methods.

masak commented 3 years ago

I just learned that John Shutt passed away a week ago, age 56. I didn't know him personally, although I felt I got to know some parts of him by reading what he has written. I'm sad to hear he's gone.

masak commented 3 years ago

I just feel I need to quote the first two paragraphs of this blog post by John Shutt, which I feel summarize his approach to Kernel:

It's finally registered on me that much of the past half century of misunderstandings and confusions about the semantics of Lisp, of quotation, and, yes, of fexprs, can be accounted for by failure to recognize there is a theoretical difference between an interpreted programming language and a compiled programming language. If we fail to take this difference into account, our mathematical technology for studying compiled programming languages will fail when applied to interpreted languages, leading to loss of coherence in language designs and a tendency to blame the language rather than the theory.

Technically, a compiler translates the program into an executable form to be run thereafter, while an interpreter figures out what the program says to do and just does it immediately. Compilation allows higher-performance execution, because the compiler takes care of reasoning about source-code before execution, usually including how to optimize the translation for whatever resources are prioritized (time, space, peripherals). It's easy to suppose this is all there is to it; what the computer does is an obvious focus for attention. One might then suppose that interpretation is a sort of cut-rate alternative to compilation, a quick-and-dirty way to implement a language if you don't care about performance. I think this misses some crucial point about interpretation, some insight to be found not in the computer, but in the mind of the programmer. I don't understand that crucial insight clearly enough — yet — to focus a whole blog post on it; but meanwhile, there's this theoretical distinction between the two strategies which also isn't to be found in the computer's-eye view, and which I do understand enough about to focus this blog post on.

I've come to think of Kernel as standing up for this largely invisible minority view of interpreted/dynamic languages as something distinct from, rather than just a lesser version of, compiled/static languages.

jnthn commented 3 years ago

I've come to think of Kernel as standing up for this largely invisible minority view of interpreted/dynamic languages as something distinct from, rather than just a lesser version of, compiled/static languages.

It's also somewhat telling that a good amount of recent success in optimizing dynamic languages (Truffle) has centered on figuring out not how to compile the languages themselves, but rather how to write self-optimizing interpreters that can be partially evaluated into compiled code. Effectively, compilation moves later, to a time when we know more, and can account for context.

raiph commented 3 years ago

Hi @masak,

I've been reading your comments for months, resisting jumping in, reading, mulling it for a few days, then returning and rereading. Rinse repeat. I may ask questions later when I'm ready. For now I wanted to let you know that I am very glad you've dug in to his work, are absorbing it, and have been recording your thoughts here so I can read them and be notified when there's more.


This is so sad:

I just learned that John Shutt passed away a week ago, age 56.

A week before his birthday too. :(

Even though it was John and his work that prompted me to post this issue in the first place, I deliberately didn't mention him because I didn't want to steer things like that. Then ven and you picked up on him anyway, and things have since progressed nicely.

Now that you've gotten into his stuff, I hope to one day read and share a strangelyconsistent post by you, perhaps many years from now, but hopefully one day before I too have passed, putting fexprs, and his fexpr work, into the broader context of lisp-like macros.

In the meantime, I'm struck how Wikipedia's fexpr page includes a link to a Wikipedia page for his Kernel language -- but it's blank -- and ends with:

In 2007, John N. Shutt proposed an extension of lambda calculus that would model fexprs without suppressing rewriting of operands, ostensibly avoiding Wand's result.[12]

Perhaps there will be more to this fexpr story; it seems somehow far too fundamental for it to be something that comes to such an abrupt halt.

raiph commented 3 years ago

Hi @jnthn,

It's also somewhat telling that a good amount of recent success in optimizing dynamic languages (Truffle) has centered on figuring out not how to compile the languages themselves, but rather how to write self-optimizing interpreters that can be partially evaluated into compiled code. Effectively, compilation moves later, to a time when we know more, and can account for context.

Presumably the general framework for that approach is Futamura's Three Projections, right?

masak commented 3 years ago

Hi @raiph,

I've been reading your comments for months, resisting jumping in, reading, mulling it for a few days, then returning and rereading. Rinse repeat. I may ask questions later when I'm ready. For now I wanted to let you know that I am very glad you've dug in to his work, are absorbing it, and have been recording your thoughts here so I can read them and be notified when there's more.

Glad to know you're one of the readers of this stealth thread about Kernel and fexprs. I think it's building up to something; here's hoping I'll be able to bring it to a satisfactory conclusion or, failing that, to a conclusion.

Now that you've gotten into his stuff, I hope to one day read and share a strangelyconsistent post by you, perhaps many years from now, but hopefully one day before I too have passed, putting fexprs, and his fexpr work, into the broader context of lisp-like macros.

I don't want to promise anything, but it sounds nice when you describe it like that. If that day comes, I hope I'm able to do fexprs justice — something that I know is tricky.

Let me say a little bit about why it's tricky to explain fexprs (a point I feel should definitely be part of explaining/understanding fexprs, and relating them to macros): Kernel's operatives are quite fundamental, and can be seen either as equivalent to lambda calculus or as generalizing, supplanting, and corrupting both Lisp/Scheme "special forms" and syntactic macros... but their main obstacle (in getting people to understand them) is that thinking about dynamic behaviors is harder than thinking about static structures. Or maybe not harder, but more prone to confusion and mixing things up. I feel I grok operatives now, but I still need to regularly remind myself that the language simply doesn't have a compile phase. How much of this resistance to the concept is innate, and how much is cultural? Hard to say. In the end, language features succeed or fail to the extent people are able to build mental models around how they will behave — compiling and running code in their heads as they read it. At best, Kernel's operatives are simply a "minority voice", in a culture that seems to favor static features and a compile phase — at worst, they're genuinely a bit more slippery as a concept, due to their late-binding and stark dynamism.

This is quite clearly related to Dijkstra's argument in A Case against the GO TO Statement:

My second remark is that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost best to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

Again, I still want to leave it open that the "conceptual gap" exhibited by fexprs might be cultural rather than innate.

In the meantime, I'm struck how Wikipedia's fexpr page includes a link to a Wikipedia page for his Kernel language -- but it's blank -- and ends with:

In 2007, John N. Shutt proposed an extension of lambda calculus that would model fexprs without suppressing rewriting of operands, ostensibly avoiding Wand's result.[12]

About that Wand result. I haven't understood it in depth yet, but according to my current understanding, the "fexprs are trivial" conclusion — something that Shutt seemed careful not to refute outright — is grounded in a decidedly static/compiled/implicit-evaluation mindset. In a way, it's only true/relevant in a world where macros and quasiquoting already occupy the niche that fexprs try to homestead. The result is perhaps better stated "fexprs and quoting interact badly with each other, trivializing the theory".

(Edit: If the way I describe it above is true, then of course it's an excellent example of how fexprs/operatives are fighting an uphill battle against the consensus reality; even the most famous result about fexprs, calling them trivial, presupposes a world that they're incompatible with.)

Perhaps there will be more to this fexpr story; it seems somehow far too fundamental for it to be something that comes to such an abrupt halt.

Aye. For people reading this thread who haven't dipped into the Kernel Report yet, I recommend reading it just to experience the clarity and (in some sense) inevitable reasoning given in the rationales for each feature introduced. They are really a joy to read. It's sad that the Report will now (most likely) never emerge out of draft form.

masak commented 3 years ago

It's also somewhat telling that a good amount of recent success in optimizing dynamic languages (Truffle) has centered on figuring out not how to compile the languages themselves, but rather how to write self-optimizing interpreters that can be partially evaluated into compiled code. Effectively, compilation moves later, to a time when we know more, and can account for context.

Presumably the general framework for that approach is Futamura's Three Projections, right?

Yes. Although that topic is huge, and would probably take us too far afield from fexprs... :wink:

But @raiph, if you do go down that path, you could do worse than perusing Collapsing Towers of Interpreters, and especially internalizing the concept of "stage polymorphism" — which seems to dovetail nicely into the discussion we've been having in #555 about staged compilation.

Also, even further afield — and clearly fascinating, although I haven't more than skimmed these — A Dialogue on Metasystem Transitions and The Language REFAL - The Theory of Compilation and Metasystem Analysis by Turchin.

vendethiel commented 3 years ago

thinking about dynamic behaviors is harder than thinking about static structures. Or maybe not harder, but more prone to confusion and mixing things up.

I think that's even more true for our tooling

masak commented 3 years ago

thinking about dynamic behaviors is harder than thinking about static structures. Or maybe not harder, but more prone to confusion and mixing things up.

I think that's even more true for our tooling

Well, of course. But that's just because (wanting to say two things at once here) of the paradise that Turing and Gödel expelled us from. It's just because we're building the tools in our image, and since we're not up to the task, nor are they.

If there were a simple answer here, we would have a mainstream language with practically no deleterious dynamism. Instead what we have are (on the one hand) mainstream Java and Ada/SPARK, with deleterious dynamism and (for those who seek it) static analysis at the high price of logical formalism, paid in proportion to the desire for formal verification. And (on the other hand) non-mainstream Agda, Coq, and Idris, where everything is tractable, but also everything is expensive.

Is there a pleasant middle ground? Heck if I know. Maybe guaranteeing that your software works as intended is simply a costly endeavor, period.

masak commented 3 years ago

map and filter

Part 7 of N.

I want to trace my understanding (as it is) of a difference between map and filter in Kernel.

First, let me quote their specified behavior from the Kernel specification. There won't be many surprises there — but pay attention to in what environments the callbacks are evaluated.

(map applicative . lists)

lists must be a nonempty list of lists; if there are two or more, they must all have the same length. If lists is empty, or if all of its elements are not lists of the same length, an error is signaled.

The map applicative applies applicative element-wise to the elements of the lists in lists (i.e., applies it to a list of the first elements of the lists, to a list of the second elements of the lists, etc.), using the dynamic environment from which map was called, and returns a list of the results, in order. The applications may be performed in any order, as long as their results occur in the resultant list in the order of their arguments in the original lists.

(filter applicative list)

Applicative filter passes each of the elements of list as an argument to applicative, one at a time in no particular order, using a fresh empty environment for each call. The result of each call to applicative must be boolean, otherwise an error is signaled. filter constructs and returns a list of all elements of list on which applicative returned true, in the same order as in list.

To sum up the quoted spec: map transforms each element using a callback; grep, sorry, filter keeps or discards elements based on the result of the callback. They are fairly similar; the former is a transformer, the latter a selector.

Except they're not similar — notice how map provides the dynamic environment to its callback, but filter doesn't. What's up with that?

No, in fact, forget what's up with that — what does it even mean? I'm serious, I'm not even 100% clear on what it means that map's callback gets to to see the dynamic environment. Somewhat clear, but not 100% clear.

Let's get back to what it means. First, the reason (also according to the spec):

map calls its applicative argument using the dynamic environment of the call to map, because that behavior is followed by the code equivalence that map seeks to preserve; but filter has no comparable equivalence to preserve. So instead, filter follows the simple and hygienic precedent of the default behavior of apply, by using a fresh empty environment for each argument call.

The code equivalence for map invoked in that rationale looks like this:

(map c (list a1,1 ... a1,m)                 ⋮                 (list an,1 ... an,m))(list (c a1,1 ... an,1 )                 ⋮                 (c a1,m ... an,m))

The code equivalence map abides by forces map to act as if a number of c applications had been run, in place, in the code. But no similar code equivalence exists for filter.

As for what this difference means, I'm not quite sure. On the one hand, klisp has a test for map giving access to the dynamic environment:

($let ((p (cons () ())))
  ($check eq?
          ($sequence (map (wrap ($vau #ignore env
                                  (set-car! p env)))
                          (list 1))
                     (car p))
          (get-current-environment)))

I get that test, with total clarity. No ambiguity there.

On the other hand — what does it mean exactly that filter does not enjoy the benefits of this dynamic environment?

My first (sinking) suspicion was that filter is horribly crippled; that a lambda passed to it can not make use of any kind of mutable state (say, something as simple as keeping track of odd-indexed and even-indexed elements). But that is demonstrably false; a simple counterexample is the derivation of the assoc applicative, where the filter callback does indeed close over an outer variable...

...So what does it mean? From my current understanding, it means that last-minute, runtime, dynamic mutations to the environment affect map callbacks, but not filter callbacks. In that sense, map is somewhat more sensitive and filter is somewhat more shielded. I wish I had a real-world example to demonstrate this, but I don't.

This is all very subtle. I think one reason for that is that the terms static and dynamic have been co-opted to something almost completely different to what I'm used to. I'm used to them meaning "compile-time" and "runtime", respectively. But Kernel doesn't have a compile time — instead it has some kind of "rolling now", wherein static means what we have defined before and dynamic means either what we are running right now or occasionally what we are evaluating inside of an operative right now.

raiph commented 3 years ago

@masak

Lisp/Scheme

Did you know that Scheme was started in an attempt to understand Carl's Actor model? And that Steele and Sussman supposedly concluded it was equivalent to lambda calculus? And that Carl says he has proved they were wrong?

thinking about dynamic behaviors is harder than thinking about static structures ... Or maybe not harder, but more prone to confusion and mixing things up.

Which of your perspectives says so? cf The Divided Brain.

Do you count open systems process calculi as static structures?

Do you consider (mathematically grounded models of) message passing to be prone to confusion and mixing things up?

I still need to regularly remind myself that [Kernel] simply doesn't have a compile phase. How much of this resistance to the concept is innate, and how much is cultural? Hard to say.

What if somewhat literally half of it was innate, innate in us all, with different cultures either reinforcing the dominance of that half, or encouraging a more harmonious world, grounded in figuring that one half is the ground to the other half's figure?

This is quite clearly related to Dijkstra's argument ...

(I've always thought there's a message in our use of "argument" to describe a perspective that is by definition one sided!)

Edsger's original CSP formulation failed as a model providing a general description of concurrency, and it took him about a decade to get with the program and incorporate unbounded indeterminacy. Why? Perhaps because he saw things too statically?

If your argument is that our left hemispheres are overly enamoured of static structures, and that there's plenty of evidence that Edsger routinely encouraged their dominance, then I say you're right. :)

our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed.

Quick thought experiment, Edsger. Rewind 10,000 years. Still think what you say is true?

What if Edsger's way of looking at things was a consequence of Edsger's way of looking at things? What if his glacially slow acceptance of what is instantly obviously intuitively true to some folk reflects a lack of vision?

Again, I still want to leave it open that the "conceptual gap" exhibited by fexprs might be cultural rather than innate.

Well that's like saying you still want to leave it open that Einstein was right when he said (translated to match context):

As far as as a mathematical system is deterministic, it does not describe the practical concurrent open systems computational reality we have in the 2020s and beyond, and as far as a non-deterministic mathematical system does describe the practical concurrent open systems computational reality we have in the 2020s and beyond, that mathematical system models unbounded indeterminacy or, equivalently, multiple recursively messaging autonomous unknown state stateful unboundedly non-deterministic automata.

(Translated by translate.garble.com, but I think it's correct.)

you could do worse than perusing Collapsing Towers of Interpreters

I think I first checked that out in 2017. And again a half dozen times since. My problem is that I focus on understanding, which, given the truism "The more I know, the less I understand", means I know ever less...

discussion we've been having in #555 about staged compilation

Thanks for the reference. Have read. Am mulling. :)

A Dialogue on Metasystem Transitions and The Language REFAL - The Theory of Compilation and Metasystem Analysis by Turchin.

Oh my. Cybernetics, and things like Turchin's MSTs, changed my life in 1995.

(In 1985 I joined a software outfit that was developing what was then called electronic publishing systems. 2 year laters, at age 27, I became Director of Research and dug deep into Ted Nelson's Xanadu, Vannevar Bush's "As We May Think" (and Smalltalk, and Lisp). Then I got immersed in the whole business side of things and didn't really come up for techie air until December 1994. By the end of that month I had discovered the web and a week later this web site, which was already extraordinary back in 1994.) (masak: Edited, fixing link syntax.)