Uncurrying - Githubissues

chrisdone commented 3 years ago

One thing that functional languages like PureScript do is generate curried functions. This appears to be a blocker for Closure. For example,

(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d,v){
    console.log(d.show(v));
  }

  function greet(d,v){
    return print(d,v + "!");
  }

  return greet(Show_string, "Mary");
})();

Closure outputs,

console.log("Mary!");

Excellent result. It did pretty straight-forward transformations here. Including removing the overhead of generic functions, which are achieved via dictionary (argument) passing in Haskell-like languages. However, in languages like Haskell/PureScript, functions are all single-argument chains of functions, like this:

(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d) {
    return function(v){
      console.log(d.show(v));
    }
  }

  function greet(d){
    return function(v){
      return print(d)(v + "!");
    }
  }

  return greet(Show_string)("Mary");
})();

Unfortunately, Closure outputs the following:

(function() {
  function c(a) {
    return function(b) {
      console.log(a.show(b));
    };
  }
  return function(a) {
    return function(b) {
      return c(a)(b + "!");
    };
  }({show:function(a) {
    return a;
  }})("Mary");
})();

This means that PureScript's output is both large and slow -- this example is small, but imagine large codebases with nested levels of this.

So it seems that "uncurrying" is not a process that Closure currently does. I'm not sure whether this was decided against, simply because it's not normal in JS to have functions like this, or whether it's an oversight, or whether there's a good reason that makes reducing these in the general case difficult.

Could someone fill me in?

If it's simply not something considered, how difficult would it be to add? Is it something I could implement myself in Closure? I'm discussing with the PureScript community and considering whether to add it to the compiler output, but on the other hand, this kind of thing feels like Closure's bread and butter, so I'm attempting to do it the Right Way before going off and patching something further up the chain.

mprobst commented 3 years ago

So it seems that "uncurrying" is not a process that Closure currently does. I'm not sure whether this was decided against, simply because it's not normal in JS to have functions like this, or whether it's an oversight, or whether there's a good reason that makes reducing these in the general case difficult.

I suspect this simply never came up as an important thing to look at, indeed because normal JavaScript doesn't produce code like this.

Can you share what the optimization algorithm you propose would look like? How would you identify the function expressions, and when to inline their contents?

chrisdone commented 3 years ago

My thinking so far is that whenever there is an uninterrupted sequence of successive function expressions, that is a match. This is a curried function of arity 2.

function f(x){
  return function(y){
    return ...;
    }
}

If the call site is f(1) then nothing special needs to happen. But if the function is fully saturated, i.e. f(1)(2), then we want to produce f(1,2) and go back to the definition and rewrite it to function f(x,y){ ... }. Obviously, this can only happen if all calls to the function are saturated to arity N.

Also consider that when uncurrying a function g e.g. of arity 5, that is called mostly fully saturated g(1,2,3,4,5), apart from sometimes with one final arg partially applied, we could generate a definition currying the last argument, e.g. h = function(arg5){ return g(1,2,3,4,arg5) }, then we call h(1,2,3,4).

I've implemented this in a Haskell-like-to-JS compiler in the past (Fay) and found that increased code size pre-Closure, but decreased code size post-Closure pass. It's also a bit faster and less liable to stack overflow. In particular, type-class generic functions implicitly by the compiler are provided an argument for each class instance (this avoids doing vlookups). Naturally, the larger the call chain, the more argument passing we're doing. With curried functions, that's a lot of nested functions and lots of f(d1)(d2)(arg1)(arg1). Here's an example of compiler output.

PureScript like Haskell generates impure code only within a module boundary, so there's a lot of mileage out of such rewrites.

Please let me know if you think this is something that could be practically added to Closure or whether you think it'd be unlikely to make it in.

mprobst commented 3 years ago

We'll discuss this and get back to you.

lauraharker commented 3 years ago

Some notes:

Here's a webservice link%253B%250A%2520%2520%2520%2520%257D%250A%2520%2520%257D%250A%250A%2520%2520function%2520greet(d)%257B%250A%2520%2520%2520%2520return%2520function(v)%257B%250A%2520%2520%2520%2520%2520%2520return%2520print(d)(v%2520%252B%2520%2522!%2522)%253B%250A%2520%2520%2520%2520%257D%250A%2520%2520%257D%250A%250A%2520%2520return%2520greet(Show_string)(%2522Mary%2522)%253B%250A%257D)()%253B) showing the lack of optimization even with ADVANCED optimizations. [edited - forgot to add the link]
The existing function inliner never inlines functions with both inner functions and local variables/parameters: https://github.com/google/closure-compiler/blob/d6c3d464b822c7e86f65e03fb97108d724c9a344/src/com/google/javascript/jscomp/InlineFunctions.java#L319-L330. Unfortunately that would include all curried functions. @concavelenz - do you recall the TODO in there?

chrisdone commented 3 years ago

Thanks for the update @lauraharker. :smile: :tada:

I added some thoughts below as concerns that PS programmers have, and why I think they'll be covered already if we follow straight-forward rules.

Nullary functions

I know this may be obvious, especially to the Closure Compiler maintainers, but I'll note it anyway: PureScript uses function (){ .. } as a way to delay evaluation. It may be tempting to reduce that down, too, but it'd cause different semantics, e.g.

function print(s){
  return function(){
    console.log(s);
  }
}

then these are strung together via

when(x>5,then(print("hi"),then(print("world"),pure(123)))

and the then will call each action as action() to force evaluation. The when is a function that'll only run the composed action of then(.., then(.., ..)) if the bool matches.

Removing the intermediate function(){..} would change the semantics here and even break performance, as the actions would all run immediately.

Of course, following "only reduce it if fully saturated" would not remove such nullary functions anyway.

Verdict: no problem.

Intermediate results

Others in the PS community have pointed out e.g. func x = let thing = bigCalculation(x) in \y -> go y thing. would compile to something like

function func(x){
  let thing = bigCalculation(x);
  return function(y){
    return go(y)(thing);
  }
}

The motivation being that you might use func in map, e.g. map(func(blah), [1,2,3,...]) - you only want bigCalculation to happen once. Plus, it doesn't satisfy what I wrote above in

uninterrupted sequence of successive

As this sequence is interrupted by let thing ... Only successive return function(){ <repeat> } would be a valid chain. In this example, I personally believe this is an exception to the rule. We do it sometimes, but it's a conscious decision to interrupt the function arguments to basically cache a value.

In other words, it's a concern PS programmers have, but it wouldn't come up anyway based on this thread.

Verdict: no problem.

nreid260 commented 3 years ago

Unfortunately, given the body of code Closure is generally used on and the technical cost of this idea, we don't have any plans to implement it in the near future.

chrisdone commented 3 years ago

Thanks for the update @nreid260; I appreciate the frankness and quick turn around on this reply. Pleasure interacting with the team.

It was worth asking. I may take a shot at it sometime in Haskell as a basic rename-and-then-uncurry step. But don’t let that stop anyone else trying; my time budget is presently allocated elsewhere!

krk commented 2 years ago

Tried setting assumeClosuresOnlyCaptureReferences = true and it worked:

On commit 2809700cd70d76b6212cd45540151b90c8d003ba, diff:

diff --git a/src/com/google/javascript/jscomp/CompilationLevel.java b/src/com/google/javascript/jscomp/CompilationLevel.java
index 02ec99aee..0211a6cc1 100644
--- a/src/com/google/javascript/jscomp/CompilationLevel.java
+++ b/src/com/google/javascript/jscomp/CompilationLevel.java
@@ -181,7 +181,7 @@ public enum CompilationLevel {
     options.setSmartNameRemoval(true);
     options.setInlineConstantVars(true);
     options.setInlineFunctions(Reach.ALL);
-    options.setAssumeClosuresOnlyCaptureReferences(false);
+    options.setAssumeClosuresOnlyCaptureReferences(true);
     options.setInlineVariables(Reach.ALL);
     options.setComputeFunctionSideEffects(true);
     options.setAssumeStrictThis(true);

Command:

java -jar bazel-bin/compiler_unshaded_deploy.jar -O ADVANCED --js ps.js --js_output_file minps.js

Input:

(function(){

  var Show_string = { show: function(s){ return s; } };

  function print(d) {
    return function(v){
      console.log(d.show(v));
    }
  }

  function greet(d){
    return function(v){
      return print(d)(v + "!");
    }
  }

  return greet(Show_string)("Mary");
})();

Output:

console.log("Mary!");

chrisdone commented 2 years ago

Thanks a bunch @krk, that's a highly encouraging tip! I'll try it.

chrisdone commented 2 years ago

:disappointed: I'm not able to reproduce your result @krk with this Dockerfile https://gist.github.com/chrisdone/52124977ff5b6833558544f977423f0c and your patch on commit 2809700cd Run InferConsts as part of typechecking integration tests.

I made a docker image with that Dockerfile and ran

bazelisk build //:compiler_unshaded_deploy.jar

and then

java -jar bazel-bin/compiler_unshaded_deploy.jar -O ADVANCED --js demo.js --js_output_file /dev/stdout

The output is:

(function(){function c(a){return function(b){console.log(a.show(b))}}return function(a){return function(b){return c(a)(b+"!")}}({show:function(a){return a}})("Mary")})();

Screenshot from 2021-12-29 15-34-26

Any idea what's different between your and my setup?

krk commented 2 years ago

@chrisdone, I have created a Dockerfile that demonstrates the patch at https://gist.github.com/krk/01355fedc9df9dd639498e23b6fa0380

# Expected output:
# Vanilla:
# (function(){function c(a){return function(b){console.log(a.show(b))}}return function(a){return function(b){return c(a)(b+"!")}}({show:function(a){return a}})("Mary")})();
# 
# Patched:
# console.log("Mary!");

chrisdone commented 2 years ago

Thanks a bunch @krk! I’ve eyeballed the Dockerfile and can’t see where my attempt diverged from yours, but no matter, I’m sure it was something subtle and trivial.

Happy new year! 🎊

EDIT: I was able to reproduce the result. :heavy_check_mark:

sd-yip commented 2 years ago

Hi @chrisdone, I am also a PureScript user looking for better handling on minification. Glad to see this thread that enables me to reuse the same findings with @krk.

There is an NPM package that I just published for easy usage: https://www.npmjs.com/package/google-closure-compiler-acocr

jeslie0 commented 9 months ago

I have packaged up @sd-yip's NPM package into a Nix Flake, which can be found here: https://github.com/jeslie0/closure-compiler-acocr. Hopefully, this is useful to anyone that finds this thread.

google / closure-compiler

Uncurrying #3713

Nullary functions

Intermediate results