tc39 / proposal-do-expressions

Proposal for `do` expressions
MIT License
1.11k stars 14 forks source link

Use-case: lazy expressions? #58

Closed getify closed 3 years ago

getify commented 3 years ago

I can think of several reasons why this may be impossible/impractical for JS, but I have a need for lazy expressions and "do expressions" popped into my head. Just wanted to quickly ask: has it been considered if some form of do-expression could be lazy?

Even that question is a bit ambiguous, I know, because we have to define more carefully what we mean by lazy, and in particular, how does the JS engine move from treating an expression as deferred/lazy to needing to fully compute it?

A few quick observations before I suggest an answer to that question:

  1. Function expressions (aka thunks) wrapped around expressions are a form of "lazy expression", but the sort that the consumer of the expression has to know about, meaning the consumer has to know that a function means "call me to compute the result of the lazy expression". These work, but they are more intrusive and thus less flexible.
  2. Blocks are a basically "lazy statements", if you consider structures like if-else branches. The JS engine doesn't need to interpret/evaluate any branch that it's going to skip over. I don't know if JS engines commonly do that sort of optimization -- basically JIT'ing branches -- but I imagine that some might.

So, in those two forms, we see a substantive split: In (1), the "lazy expression" is deferred until the code itself decides to compute it. In (2), the "lazy statement(s)" are (potentially) deferred in evaluation until the JS engine decides it's definitely going to execute that "part" of the program.

So my earlier questions is basically, "would do-expressions be (1) or (2) style of laziness"?

At the moment, I'm pondering more the (2) style. IOW, a "lazy do expression" would be some way that a program tells the JS engine: "hey JS engine, this code isn't relying on any eager side-effects, so feel free to defer evaluating this expression until you really need to... it won't break my program either way, for you to decide to run it early or late".

Such a form wouldn't be controllable by the program in any way, it would be marking an expression as eligible for the same sort of "JITing" as an else-branch being skipped entirely if the if-branch is evaluated.

I imagine "lazy expressions" have been discussed a fair bit in JS's history, so there's probably already a lot "here's why that wouldn't work" prior art. If so, can anyone point that out? If not, I figured this thread could at least document the pros/cons of such an idea for posterity sake.

getify commented 3 years ago

I have a use-case in a program I'm currently writing that could, at least in theory, significantly benefit from such a feature, in the sense of helping the JS engine know it could defer the vast majority of expressions in my program. I have lots of IO-monad expressions in a large conditional tree of evaluation... hundreds of expressions, when only a handful are actually needed in any given pass of execution.

Since all these expressions are IO monads, they're "guaranteed" safe/lazy in their side-effects. But the construction of the IO monad instances themselves is, I'm sure, chewing up quite a bit of CPU and memory (and GC churn)

I could wrap all these sets of IO-constructing expression(s) in thunks (arrow functions) -- aka, (1) style -- because the control structure utility I use is aware of functions and can easily handle them. But function calls have a bit more overhead than what I imagine computing a lazy-do-expression might, so I was dreaming about the idea that lazy-do-expressions would be a much more efficient way to let JS optimize this pattern.

ljharb commented 3 years ago

I would expect such laziness to in fact require the same overhead functions do (to keep closed-over references to variables in scope, particularly)

getify commented 3 years ago

to keep closed-over references to variables in scope, particularly

I don't see that being an applicable problem, because the lazy expression is NOT a function that you pass to another location and expect it to be computed there... that's what you would use a function wrapper for.

I think in call cases, the JS engine couldn't defer the computation of such a contemplated lazy expression "into" another function's execution context. IOW, by the point the JS engine needs to actually perform the function call, it's got to fully know all the arguments it's passing in, so it's got to reduce any lazy expressions to their evaluated results.

What's lazy though is that if the function call exists in some part of your program, and one or more of its arguments are lazy, and the rest of the surrounding expressions don't actually require the execution of that function... then the function's arguments never need to be computed.

bakkot commented 3 years ago

Are you imagining something along the lines of this:

let x = lazy do {
  foo()
};

if (condition()) {
  bar(x);
}

where foo() gets evaluated if and only if condition() is true? (Or, perhaps, in the true case it would depend on whether bar itself actually accesses its argument?)

If so, one of the major downsides here is that you've made local variable access be side-effecting, something which many people (myself included) think is a major impediment to reasoning about programs in JS. (That's part of why with is banned in strict mode.)

For example, consider the following:

let y = 0;

let x = lazy do {
  y
};

console.log(x);

y = 1;

if (condition()) {
  bar(x); // presumably evaluates to `bar(0)`? but if you remove the `console.log`, it's `bar(1)`?
}

I think this would be an interesting project as a compile-to-js language (read: babel plugin), but as to actually adding it to the language, I think it would be much too radical of a change to how JS works.

getify commented 3 years ago

Are you imagining something along the lines of this:

I wasn't imagining its use in this way, but your suggestion is intriguing to contemplate.

As I said in the OP, an if control structure (as well as things like switch, etc) already provides a sort of "lazy statement" context to nest things, so typically you would do this if you wanted to "defer" the evaluation of x:

if (condition()) {
  let x = foo();
  bar(x);
}

Your example suggests it could be nice to be able to non-localize that conditional evaluation. Specifically, by being able to define such a lazy-expression outside the if..else structure, you could reuse it several places in the tree, as opposed to just copy-pasting the contents of the expression.

Of course, I don't think that sort of capability needs to imply a full-closure style of lazy-non-localization.

one of the major downsides here is that you've made local variable access be side-effecting

Yeah, my intended usage is with lazy IO's that are specifically avoiding those sorts of "bad" patterns. I agree your second example is a "bad" usage. It could re-compute on each access (instead of memoizing) to avoid the hazard you described. JS engines could in theory track changes to local variables and only re-compute if they've changed. But, yeah, it's a rabbit hole.

I should mention that this type of "weird behavior" is already entirely possible (and sadly common) in standard JS, because of variable re-assignment and function calls. So it's not really introducing a new hazard per se, but merely an additional way of doing things that, if not done responsibly, can create unexpected outcomes.

The way I was thinking, the wrapping of code in a lazy-do-expression would be disclaiming any ill effects on your programs behavior depending on when (and how often) the expression is computed. If you put pure expressions in there, you're safe. If not, thar be dragons. I sort of lump this behavior in a similar bucket as the recently landed GC-sensitive weakref finalizers.

As with almost any feature you could imagine putting into JS, there's pros (like those contemplated above) and cons (like this).

bakkot commented 3 years ago

I wasn't imagining its use in this way

Hm, I'm probably not understanding what you're suggesting, then. (Or is it that you were imagining a faculty like the one I illustrated, just not this particular application of it?)

I should mention that this type of "weird behavior" is already entirely possible (and sadly common) in standard JS, because of variable re-assignment and function calls. So it's not really introducing a new hazard per se, but merely an additional way of doing things that, if not done responsibly, can create unexpected outcomes.

It's true that it's not introducing a new hazard, but it would be making that hazard possible in many more cases, and thus much harder to watch out for: now you'd need to know that side effects are possible when you look at any local variable access, not just function calls (which are much more explicit).

(Of course it is already possible to have side-effecting variable access through with and through accessors on the global object, but those things are very strongly discouraged specifically so that you don't have to think about them when reading normal code.)

The way I was thinking, the wrapping of code in a lazy-do-expression would be disclaiming any ill effects on your programs behavior depending on when (and how often) the expression is computed. If you put pure expressions in there, you're safe. If not, thar be dragons.

It sounds like the thing you actually want might be purity annotations, an idea which is close to my heart but which engines have generally pushed back on (for good reason). It's just not practical for a human to verify that a snippet of JavaScript is pure, and it's too expensive (and extremely difficult) for an engine to check it at runtime. And debugging the issues which arise when someone inevitably gets it wrong is... painful.

Though, again, might be a good idea for a compile-to-JS language.

I sort of lump this behavior in a similar bucket as the recently landed GC-sensitive weakref finalizers.

WeakRefs and finalizers are definitely scary. But they're something we can (and should) discourage except for very specific circumstances where they're absolutely necessary, like other power tools. Here the thing you're proposing sounds like a much more general mechanism for structuring your code, which would be at least potentially applicable across many different kinds of programs, which means (I would argue) its potential for unleashing dragons is much more severe of a problem.

getify commented 3 years ago

It sounds like the thing you actually want might be purity annotations

Similar yes, but I don't want (and wouldn't expect) the JS engine to enforce purity. Rather, I'd be saying to the engine, by my usage of it, that the expression is pure (or, that if it's not, I'll deal with the negative consequences), so if it wants to profile and defer the computation for some reduction in CPU/memory/GC, that's great.

It's just not practical for a human to verify that a snippet of JavaScript is pure,

I disagree with this as a broad statement. If you choose to follow FP style practices in a JS program, you are in effect (pun intended) saying exactly that: that you're going to be extremely careful about where your side effects are and are not. That goes even stronger when you get to using the IO monad heavily, as I am of late.

Here the thing you're proposing sounds like a much more general mechanism for structuring your code

Your proposed usage could be construed more generally. Maybe that's a good thing, maybe not. But I somewhat doubt that most would see the point in doing it.

My originally intended usage was a bit more narrow. That is, it's only "general" insofar as you're choosing to do FP style programming in JS, and thus you're already likely very familiar with, and being very careful of, side-effect expressions and non-pure functions.

If you're writing that style of FP, lazy-do-expressions could end up being quite useful (and more performant, maybe) while not really introducing any more hazard above the care you're already taking.

So I suppose I'd say it's contemplating the addition of a feature that is targeted at/intended for a specific paradigm of programming rather than all paradigms. Of course, TC39 has famously done quite a bit of paradigm-specific expansion of the language, such as the class keyword and all its descendants.

getify commented 3 years ago

For what it's worth, I've done more thinking and exploring with this, and while I think the idea has some merit, the main usage I was contemplating can't actually make use of it. For posterity, I'll leave a sketch of that (and explanation why) here, but for now close this issue since it doesn't seem to have much interest and since my needs for it have fallen through.

Here's basically what I was imagining:

doXorY(
   someVal,
   lazy do { computeExpensiveArgument(1) },
   lazy do { computeOtherExpensiveArgument(2) }
);

The idea here was that only one of these expensive arguments is needed, and the doXorY(..) function decides which one it needs based on the value of someVal.

But I also asserted above that lazy do { .. } would not be like a function (with closure); it could be deferred but would in all cases still need to be evaluated in its current execution context -- at the latest possible moment. The thing I was mistaken about is how that could apply to my usage as illustrated above. JS has no idea what doXorY(..) is going to do until after it calls it... so JS would still have to compute both arguments to pass them in. IOW, the lazy-do wouldn't net us any benefit at all here.

IOW, for a lazy-do to effectively be deferred, JS has to be able to decide that lifecycle statically in the current execution context before another execution context (function call) needs to be invoked. From what I can tell, I think this basically boils down to the existing JS flow-control mechanisms (if blocks, etc).

In my case, it seems the only appropriate option is to indeed wrap the expensive arguments in lambdas, and have doXorY(..) only invoke the one it needs, thereby skipping the expense of computing the other unneeded argument:

doXorY(
   someVal,
   $=>computeExpensiveArgument(1),
   $=>computeOtherExpensiveArgument(2)
);

This still pays the closure (for both!) and function invocation cost (for one!), but it avoids the additional allocation costs that these expensive computations imply. I guess that's about as good as it's going to get in JS.

Thanks for humoring my experiment/exploration here.