Implementing pipeline operator as a syntax transformer

I recently had a brainstorm for a TC39 proposal, and if it's not too much trouble I'd like to know if the Pipeline champions think, as I do, that it could be a vehicle for eventually getting the pipeline operator accepted into ECMAScript. I've written a partial draft of the proposal on the TC39 Discourse:

https://es.discourse.group/t/proposal-parser-augmentation-mechanism/2008

As a quick summary, though, the Parser Augmentation mechanism is a syntax that allows standards-conforming ECMAScript code to describe syntax-level changes to source text prior to interpretation by the ECMAScript parser, along with a way to declare usage of such transformations in a source file or when importing one. It is, effectively, a preprocessor for JS.

Given that this represents an unprecedented level of control over the module import process, I think it's highly likely that browser vendors will be unwilling to ship an enabled Parser Augmentation implementation at first, or even ever, but browser acceptance actually isn't a requirement for this proposal. Since the Parser Augmentation mechanism only affects parser input, it can serve to inform bundlers or transpilers what transformations are needed on any given source file, along with precisely defining how those transformations must function. In the context of the build process, runtime security and performance are non-issues.

A functional Parser Augmentation implementation, even when limited to build tooling, would unlock the ability to start using novel syntaxes like the pipeline operator during development. By specifying the precise transformation or transformations that should be applied to a given file in or attached to the file being transformed, a non-standards-conforming source file can be definitively converted to a compliant ECMAScript source text. In this mode of operation, the Parser Augmentation syntax can be treated by the developer much like Python's from __future__ statement, in that it enables new core language functionality. Unlike from __future__, Parser Augmentation can't introduce new semantics for existing functionality, but also unlike from __future__, it doesn't require that the modifications be hard-coded into the engine.

If Parser Augmentation were to gain support in TC39, it would be the perfect way to test and gather usage data for pipeline syntax. Babel has support for the pipeline operator, but it forces a dependency on Babel (obviously) and requires extra configuration outside the source file, which may or may not be available to the developer. In contrast, Parser Augmentation syntax only requires a single declaration at the top of a source file along with adding a compile-time dependency on whatever NPM module contains the transformation definition for the pipeline operator, which is a much lower barrier to entry.

If the pipeline syntax then gains traction in the developer community, that would be a strong argument in favor of reopening the discussion on this proposal, either to add the syntax to base ECMAScript if demand is high enough and the syntax cost is deemed acceptable, or to add the "pipeline" transformer to a list of standard transformers that compliant engines are required to support. In the latter case, the Parser Augmentation syntax would still be required in JS files that use the pipeline syntax, but in that case it would simply be a way to inform the browser what syntax variant should be used in the native parser, much like "use strict" does. And, like "use strict", it could conceivably be placed in a number of locations, as it would only serve as a flag to the parser.

So basically the question becomes: what if we could implement the Pipeline operator without having to add any syntax burden to ECMAScript?

I don't see how that would work at all - pipeline needs to work in any JS context, including Scripts. Additionally, "no more modes" is a pretty common mantra on the committee, meaning, it wouldn't be viable to have a pragma that changes how a program evaluates.

I don't see how that would work at all - pipeline needs to work in any JS context, including Scripts.

The PA mechanism will define multiple ways to specify transformers - they can appear in the "use strict" position (if the committee allows it), they can appear as import attributes, they can appear out-of-band. type="json" is currently in stage 3 and it, too, could be seen as a type of input transformation (though its spec is not currently written that way; it instead introduces a new semantic concept of "synthetic modules" to the ECMAScript lexicon). That suggests an obvious mechanism for out-of-band PA specifications - I could see borrowing MIME's usage of + to represent "this format modified by this syntax", like <script type="script+pipeline">. The equivalent MIME type could conceivably be text/javascript+pipeline.

Additionally, "no more modes" is a pretty common mantra on the committee

That'd be unfortunate (imo) but it wouldn't kill the PA mechanism. There would even be one upside to removing the in-source declaration form of PA: it would turn the proposal into a purely semantic proposal, with no syntax burden at all, but which would expand the available syntax real-estate by making (potential) new syntaxes opt-in.

There's an important distinction between PA and "use strict", though: strict mode has semantic meaning. PA, by definition, resolves to well-defined ECMAScript with exactly the same semantics as it would have without PA (which is why it can be used in and consumed by bundlers with 100% functionality, presuming browsers decide not to adopt PA into live engines).

The following is from my work on the in-progress proposal and shows one example of a Transformation Description (TD) function that could be used to implement the pipeline operator:

function pipeline(parser) {
    // Defined operators and keywords can have handlers associated with them, as a third argument.
    // The handlers are called upon emit.
    parser.defineOperator("|>", {associativity: "left", precedence: "=", context: "block"}, pipeOperator);
    parser.defineKeyword("%", {treatAs: "identifier", context: {operator: "|>", operand: 2}}, topicReference);
    // having no parse/emit calls in a TD is logically equivalent to a final line reading:
    // for await (const node of parser.parseNode()) parser.emit(node);
}
function pipeOperator(parser, expr, context) {
    if (expr.rhs.isKeyword("%")) {
        parser.emit(expr.lhs);
    } else {
        // expr.state is a container to store TD-local metadata. It is not visible to other TDs or to
        // the emitted AST.
        expr.state.topicVariable = context.newSyntheticLocal();
        // this emit will trigger the keyword handler for topic references in the rhs
        parser.emit(Parser.syntheticExpression`${topicVariable}=${expr.lhs},${expr.rhs}`);
    }
}
function topicReference(parser, expr, context) {
    parser.emit(context.state.topicVariable);
}
Parser.registerImplementation("pipeline", pipeline);

syntax "pipeline"; // a syntax declaration takes effect after the newline following the statement, so just after here →
console.log("foo" |> one(%) |> two(%) |> three(%));

And as a reminder, browsers are not required to support this (but of course, they would be welcome to). In particular, I would expect that on a browser, the registerImplementation call (which registers an unconditional transformation that would override any built-in "pipeline" support) would throw an Error, and even if TC39 approves the syntax declaration, parsers would not be required to support it except at the top of file. The above snippet, defining and then immediately using a new transformer, would likely only be useful in a build environment. Even then, the expected practice is for TDs to be defined in their own modules, and for the utilizing module to have something like the following at the top of the file:

syntax import "pipeline" from "./pipeline.js";

(Or it could just use syntax "pipeline", if it knows the transformer has been defined elsewhere.)

[ETA:

It occurs to me that, while browsers wouldn't (and wouldn't be expected to) support syntax changes midstream, a much easier sell would be to support, only at start-of-file, something like:

syntax "pipeline" with {operator: "|>", topicReference: "%"};

Especially since they would be permitted to generate a SyntaxError for any reason, like "doesn't actually support any operators besides |>", or "the first six bytes of the file must be s y n t a x". It'd be the kind of thing you look up on caniuse, which browsers have support for which pipe syntaxes. And if the browsers support that, then it's a short hop over to

syntax "pipeline" with {operator: "|>", pipeStyle: "F#"};

which wouldn't even be that hard, since F#-style pipes will parse without error in a Hack-style parser. And since the two pipe styles are parse-compatible, it wouldn't actually be such a long shot at all to see a browser start experimenting with syntax push and syntax pop support:

syntax "pipeline" with {operator: "|>", topicReference: "%"};
console.log("foo" |> one(%) |> two(%) |> three(%));
syntax push "pipeline" with {pipeStyle: "F#"};
console.log("foo" |> one |> two |> three);
syntax pop;

Now that I think about it, browsers are already compliant with this proposal. The proper behavior if a host doesn't support a transformation named in a syntax directive is to throw a SyntaxError, and that's exactly what browsers do today if you type syntax "pipeline";!

A little bit far-fetched, perhaps, but all it takes is for the first transpiler to support the syntax directive to get started 😄]

I don't want to yoke this proposal to even larger proposal, but I'll note that pipeline is indeed accomplishable via a simple syntax transform, just using a unique-named temp variable and abusing the comma operator. So if this proposal does somehow stall long enough that a syntax transform proposal gets ahead of it, using pipeline as an argument for the syntax transformer would be reasonable.

@tabatkins That's exactly my thought! None of the syntax-transformation-equivalent proposals lying around - this one, JSON Modules, BinAST, Type Annotations, a handful of others - require a generalized syntax transformer. Personally, I'm cheering for the pipeline proposal to zoom through the process as loud as I can, from out here in the cheap seats!

Of course, if this proposal does make it into 262 on its own merits, with its syntax intact (let's say, for the purposes of this argument, that it's using |> and %) that's not the end of the story as far as Parser Augmentation is concerned. PA would provide the syntax to allow an implementation to switch between F#-style and Hack-style pipes on a source-line-by-source-line basis, without having to change its parser implementation at all. A TD like the following:

function pipeline(parser, withOptions) {
    if (withOptions?.pipeStyle?.toLowerCase() === "hack") return;
    if (withOptions?.pipeStyle?.toLowerCase() !== "f#") throw new SyntaxError("pipeline style unsupported");
    parser.defineOperator("|>", null, fSharpPipeOperator); // no options required if |> is a known operator
}
function fSharpPipeOperator(parser, expr, context) {
    parser.emit(Parser.syntheticExpression`${expr.lhs} |> (${expr.rhs})(%)`);
    // the synthetic |> generated here isn't subject to the redefinition, so there's no recursion hazard
}
Parser.registerImplementation("pipeline", pipeline);

would define the appropriate behavior for dealing with F# syntax, but as I recently mentioned over at the BinAST repo, implementations wouldn't be required to use that or any other JS-level implementation. The PA mechanism has a logical functionality similar to the C-family #define directive, but the implementation is expected to be much more like the #pragma directive: just a signal to the host to change some internal variable. But unlike #pragma, hosts can't do just "whatever they want" with it - it has to be logically equivalent to a TD.

[ETA:

To make this more explicit, one valid implementation of a PA-compliant parser would be a browser that:

Parses a file with a static parser that recognizes the syntax of a syntax directive the same way it recognizes the syntax of an import directive
(if the parse succeeds) Checks all syntax nodes in the parsed AST against the list of syntax representations the static parser supports
Throws a SyntaxError if any of the syntax directives specify an unsupported representation (assuming the parse didn't just outright fail in step 1)

That's why I say that browsers already support the syntax side of Parser Augmentation. All syntax transformations are presently unsupported because they don't exist yet, and browsers are correctly throwing a SyntaxError if you try to specify one.]

IMO, parser hints (<script type="script+pipeline">) that change how a certain browser/ runtime interprets things will be interpreted as a horror story by the larger community. Effectively, this was rejected several times.

Eventually many other proposals would want to do this and that would lead to chaos: just imagine having to deal with <script type="script+pipeline+temporal+record">. You tested your script with script+pipeline but did you test it with script+pipeline+temporal+record ? Or what if runtime X does support script+pipeline and script+temporal+record, but not script+pipeline+temporal+record ?

Can you imagine how many CI tests would need to be run?, as the number of combinations would raise exponentially

I saw this in another comment in the bike shedding issue:

Originally posted by @yordis in https://github.com/tc39/proposal-pipeline-operator/issues/91#issuecomment-2105546490

What point exactly are you trying to make, @bogdanbiv? Your strawman isn't particularly clear. What in the world do temporal and record have to do with pipeline? From your tone it sounds like you're jumping on the "everyone ridicule Parser Augmentation" bandwagon but for the life of me I can't figure out what argument you're trying to make

tc39 / proposal-pipeline-operator

Implementing pipeline operator as a syntax transformer #303