Closed chrisdone closed 3 years ago
So it seems that "uncurrying" is not a process that Closure currently does. I'm not sure whether this was decided against, simply because it's not normal in JS to have functions like this, or whether it's an oversight, or whether there's a good reason that makes reducing these in the general case difficult.
I suspect this simply never came up as an important thing to look at, indeed because normal JavaScript doesn't produce code like this.
Can you share what the optimization algorithm you propose would look like? How would you identify the function expressions, and when to inline their contents?
My thinking so far is that whenever there is an uninterrupted sequence of successive function
expressions, that is a match. This is a curried function of arity 2.
function f(x){
return function(y){
return ...;
}
}
If the call site is f(1)
then nothing special needs to happen. But if the function is fully saturated, i.e. f(1)(2)
, then we want to produce f(1,2)
and go back to the definition and rewrite it to function f(x,y){ ... }
. Obviously, this can only happen if all calls to the function are saturated to arity N.
Also consider that when uncurrying a function g
e.g. of arity 5, that is called mostly fully saturated g(1,2,3,4,5)
, apart from sometimes with one final arg partially applied, we could generate a definition currying the last argument, e.g. h = function(arg5){ return g(1,2,3,4,arg5) }
, then we call h(1,2,3,4)
.
I've implemented this in a Haskell-like-to-JS compiler in the past (Fay) and found that increased code size pre-Closure, but decreased code size post-Closure pass. It's also a bit faster and less liable to stack overflow. In particular, type-class generic functions implicitly by the compiler are provided an argument for each class instance (this avoids doing vlookups). Naturally, the larger the call chain, the more argument passing we're doing. With curried functions, that's a lot of nested functions
and lots of f(d1)(d2)(arg1)(arg1)
. Here's an example of compiler output.
PureScript like Haskell generates impure code only within a module boundary, so there's a lot of mileage out of such rewrites.
Please let me know if you think this is something that could be practically added to Closure or whether you think it'd be unlikely to make it in.
We'll discuss this and get back to you.
Some notes:
Thanks for the update @lauraharker. :smile: :tada:
I added some thoughts below as concerns that PS programmers have, and why I think they'll be covered already if we follow straight-forward rules.
I know this may be obvious, especially to the Closure Compiler maintainers, but I'll note it anyway: PureScript uses function (){ .. }
as a way to delay evaluation. It may be tempting to reduce that down, too, but it'd cause different semantics, e.g.
function print(s){
return function(){
console.log(s);
}
}
then these are strung together via
when(x>5,then(print("hi"),then(print("world"),pure(123)))
and the then
will call each action as action()
to force evaluation. The when
is a function that'll only run the composed action of then(.., then(.., ..))
if the bool matches.
Removing the intermediate function(){..}
would change the semantics here and even break performance, as the actions would all run immediately.
Of course, following "only reduce it if fully saturated" would not remove such nullary functions anyway.
Verdict: no problem.
Others in the PS community have pointed out e.g. func x = let thing = bigCalculation(x) in \y -> go y thing.
would compile to something like
function func(x){
let thing = bigCalculation(x);
return function(y){
return go(y)(thing);
}
}
The motivation being that you might use func
in map, e.g. map(func(blah), [1,2,3,...])
- you only want bigCalculation
to happen once. Plus, it doesn't satisfy what I wrote above in
uninterrupted sequence of successive
As this sequence is interrupted by let thing ..
. Only successive return function(){ <repeat> }
would be a valid chain. In this example, I personally believe this is an exception to the rule. We do it sometimes, but it's a conscious decision to interrupt the function arguments to basically cache a value.
In other words, it's a concern PS programmers have, but it wouldn't come up anyway based on this thread.
Verdict: no problem.
Unfortunately, given the body of code Closure is generally used on and the technical cost of this idea, we don't have any plans to implement it in the near future.
Thanks for the update @nreid260; I appreciate the frankness and quick turn around on this reply. Pleasure interacting with the team.
It was worth asking. I may take a shot at it sometime in Haskell as a basic rename-and-then-uncurry step. But don’t let that stop anyone else trying; my time budget is presently allocated elsewhere!
Tried setting assumeClosuresOnlyCaptureReferences = true
and it worked:
On commit 2809700cd70d76b6212cd45540151b90c8d003ba, diff:
diff --git a/src/com/google/javascript/jscomp/CompilationLevel.java b/src/com/google/javascript/jscomp/CompilationLevel.java
index 02ec99aee..0211a6cc1 100644
--- a/src/com/google/javascript/jscomp/CompilationLevel.java
+++ b/src/com/google/javascript/jscomp/CompilationLevel.java
@@ -181,7 +181,7 @@ public enum CompilationLevel {
options.setSmartNameRemoval(true);
options.setInlineConstantVars(true);
options.setInlineFunctions(Reach.ALL);
- options.setAssumeClosuresOnlyCaptureReferences(false);
+ options.setAssumeClosuresOnlyCaptureReferences(true);
options.setInlineVariables(Reach.ALL);
options.setComputeFunctionSideEffects(true);
options.setAssumeStrictThis(true);
Command:
java -jar bazel-bin/compiler_unshaded_deploy.jar -O ADVANCED --js ps.js --js_output_file minps.js
Input:
(function(){
var Show_string = { show: function(s){ return s; } };
function print(d) {
return function(v){
console.log(d.show(v));
}
}
function greet(d){
return function(v){
return print(d)(v + "!");
}
}
return greet(Show_string)("Mary");
})();
Output:
console.log("Mary!");
Thanks a bunch @krk, that's a highly encouraging tip! I'll try it.
:disappointed: I'm not able to reproduce your result @krk with this Dockerfile https://gist.github.com/chrisdone/52124977ff5b6833558544f977423f0c and your patch on commit 2809700cd Run InferConsts as part of typechecking integration tests
.
I made a docker image with that Dockerfile and ran
bazelisk build //:compiler_unshaded_deploy.jar
and then
java -jar bazel-bin/compiler_unshaded_deploy.jar -O ADVANCED --js demo.js --js_output_file /dev/stdout
The output is:
(function(){function c(a){return function(b){console.log(a.show(b))}}return function(a){return function(b){return c(a)(b+"!")}}({show:function(a){return a}})("Mary")})();
Any idea what's different between your and my setup?
@chrisdone, I have created a Dockerfile that demonstrates the patch at https://gist.github.com/krk/01355fedc9df9dd639498e23b6fa0380
# Expected output:
# Vanilla:
# (function(){function c(a){return function(b){console.log(a.show(b))}}return function(a){return function(b){return c(a)(b+"!")}}({show:function(a){return a}})("Mary")})();
#
# Patched:
# console.log("Mary!");
Thanks a bunch @krk! I’ve eyeballed the Dockerfile and can’t see where my attempt diverged from yours, but no matter, I’m sure it was something subtle and trivial.
Happy new year! 🎊
EDIT: I was able to reproduce the result. :heavy_check_mark:
Hi @chrisdone, I am also a PureScript user looking for better handling on minification. Glad to see this thread that enables me to reuse the same findings with @krk.
There is an NPM package that I just published for easy usage: https://www.npmjs.com/package/google-closure-compiler-acocr
I have packaged up @sd-yip's NPM package into a Nix Flake, which can be found here: https://github.com/jeslie0/closure-compiler-acocr. Hopefully, this is useful to anyone that finds this thread.
One thing that functional languages like PureScript do is generate curried functions. This appears to be a blocker for Closure. For example,
Closure outputs,
Excellent result. It did pretty straight-forward transformations here. Including removing the overhead of generic functions, which are achieved via dictionary (argument) passing in Haskell-like languages. However, in languages like Haskell/PureScript, functions are all single-argument chains of functions, like this:
Unfortunately, Closure outputs the following:
This means that PureScript's output is both large and slow -- this example is small, but imagine large codebases with nested levels of this.
So it seems that "uncurrying" is not a process that Closure currently does. I'm not sure whether this was decided against, simply because it's not normal in JS to have functions like this, or whether it's an oversight, or whether there's a good reason that makes reducing these in the general case difficult.
Could someone fill me in?
If it's simply not something considered, how difficult would it be to add? Is it something I could implement myself in Closure? I'm discussing with the PureScript community and considering whether to add it to the compiler output, but on the other hand, this kind of thing feels like Closure's bread and butter, so I'm attempting to do it the Right Way before going off and patching something further up the chain.