Open borkdude opened 2 years ago
@thheller I haven't looked into the esm build implementation that closely yet, but maybe this rings any bells?
Regular JS tree-shaking is very limited. It is not comparable to the Closure Compiler :advanced
optimizations, and recognizes very few patterns. That works when you write code specifically for it, but quickly gives up otherwise.
The :target :esm
is just a "hack" to make the :advanced
output look like ESM when it technically isn't.
So, for exports it does this and the final JS output it generates looks something like:
// defined as a prepend, not going through :advanced
// each export gets a placeholder name
let export_js_keys;
// then actual advanced output of module
<advanced output>
// defined as a append, also added after :advanced
// each placeholder name is then exported as the actual configured export name
export { export_js_keys as js_keys };
Where part of the :advanced
input will include a export_js_keys = cljs.core.js_keys;
. The export_js_keys
is added to externs so it doesn't get renamed. So, what will remain is export_js_keys = aA
(or just js-keys
impl directly inlined). This is done because there is no other way to have the Closure Compiler generate export
statements itself directly. You cannot feed it export
since :advanced
will remove it.
The regular JS tree-shaking doesn't recognize this and as such doesn't eliminate anything.
Makes sense.
This might be a hack, but it seemed to work. Invoking the google closure compile once again:
npx google-closure-compiler --module_resolution NODE --compilation_level ADVANCED --js=node_modules/cherry-cljs/lib/cljs_core.js --js=node_modules/cherry-cljs/cljs.core.js --js=index.mjs --js_output_file=dist/out.js --language_in ECMASCRIPT_2020
Instead of 300kb we now end up with 160kb.
Perhaps we can do some rewriting manually on the outputted cherry/cljs.core ES6 module as a one time hack on the Closure output. A nice research project of its own which is fairly isolated. If anyone wants to investigate this, help is welcome!
I did some research and found a pretty comprehensive comment on this topic from Evan Wallace, who wrote esbuild (I also found a related comment). Long story short, the optimizations that Closure Compiler does, by looking at object properties and rewriting/eliminating the ones that aren't used, are considered to be unsafe and inherently problematic, and in general tools such as esbuild or Terser do more conservative static analysis, and this kind of analysis and minimization works well for basically all JS code that isn't written in Closure Compiler's format.
Another point of consideration is that all of the current minimization tools (including the two above) are basically on par or sometimes slightly better at minimizing JS code than Closure Compiler based on this benchmark.
With these in mind, and since cherry is targeting a more standard ES6 output, I would say that it's probably worthwhile to rewrite cljs_core.js
and related files to work well with JS minimizers. I believe this should be possible by either making a bespoke script that analyzes the AST of that file and rewrites it (using, for example, babel), or doing it by hand (probably the former since it's automatically generated 🙂).
@jo-sm Thanks. Yes, rewriting the Closure output is probably the best idea. That, or bootstrapping core itself with cherry (which will be more work but maybe in the future, attainable).
(Or hacking Closure)
I would guess that for the short term rewriting makes sense, but it depends on how simple/complex/feasible bootstrapping in Cherry is. Do you think it makes sense to automate it, and if so, is it fine to hardcode some of the things based on the output generated by shadow (e.g. I noticed that there is $APP
)? If that's acceptable then I think it should be somewhat straightforward to rewrite the AST, and I could look into it (I've done something similar experimenting rewriting the Google Closure libraries to be entirely ES6 compatible with import/export).
@jo-sm Yes, it will take a long time before we're able to fully bootstrap, if we will ever get there: it depends on if it's worth pursuing as well. So for now, the rewriting strategy seems good. You can probably get some insights using --debug
as well. Thanks for looking into this.
Another point of consideration is that all of the current minimization tools (including the two above) are basically on par or sometimes slightly better at minimizing JS code than Closure Compiler based on this benchmark.
I have to strongly disagree here. The JS tools are not even remotely close.
One big caveat of course is that you only get the full benefit if you have code that was written with :advanced
in mind. This nowadays includes proper ES6+ code and is no longer limited to ClosureJS. Testing it against random npm libraries is pointless, it also misses the "whole program optimization" part that :advanced
is built around. If you disregard this then the Closure Compiler is indeed average. But thats like saying a Plane is as fast as a Car, and never actually letting the Plane fly.
Another huge caveat is of course the flexibility you lose. Sometimes its not worth losing that, especially when publishing libraries that then run through another build tool again. Of course the unsafe part is true, thats why externs are sometimes an issue still.
(e.g. I noticed that there is $APP)?
shadow-cljs uses the $APP
object as basically the root of everything. So cljs.core/assoc
becomes $APP.cljs.core.assoc
but due to :advanced
collapsing namespaces and renaming everything its something like $APP.ab
. Each module will import this $APP
via import
and also re-rexport it. It'll also add its own properties to it. Basically this is the "shared global namespace" that we no longer have otherwise because of module isolation.
This is actually just using the :rename-prefix-namespace
feature from the closure compiler. Basically it just prefixes all objects that would otherwise end up in the global namespace with $APP.
. The modules import and re-rexporting it is just added by shadow-cljs. Multi-module :browser
builds also use this.
We can agree to disagree 🙂 At the risk of making the thread go into the rabbit hole of GCC discussion: Yes it's not running benchmarks against big projects, but it's running benchmarks against nontrivial JS code (these libraries are large, including ones like echarts
which is 3MB uncompressed), so it's a decent indication of how the minimizer will work against normal JS code (as in, code that isn't necessarily written with GCC in mind, which the majority of large CLJS projects use via npm). I have not myself benchmarked GCC against other compilers in terms of "whole program optimization" but I am skeptical that the difference matters enough in practice to matter -- but I would be open to being proven wrong here, and I'll leave it at that.
With all this in mind I'll look at rewriting cljs_core.js
to export
the functions like js_keys
directly and see how things go from there. @borkdude I just need to run bb run build
to build the npm package files, right?
I just need to run bb run build to build the npm package files, right?
If you want to discuss GCC vs other tools, you can open a discussion here.
... , but it's running benchmarks against nontrivial JS code
That is missing the point. The point of this thread and my comment is tree-shaking or dead code elimination, ie. removing unused code. When minifying libraries on their own this obviously cannot be done. It can't know what the consumer is going to use, so it has to keep everything. So, you are really just minifying. We agree that the Closure Compiler is an average (and slow) minifier compared to the alternatives.
That is what the benchmark you linked is testing. It is not testing tree-shaking or DCE, which is sort of what :advanced
is all about.
If you are fine with just minifying just use :simple
.
I think it's a good discussion to have, but let's continue here: https://github.com/borkdude/cherry/discussions/36
Hmm, I'm having trouble running bb run build
-- I started a thread on the #cherry slack channel to discuss further. I think I might just be missing something minor.
@jo-sm You probably just need to upgrade bb.
Update from https://github.com/borkdude/cherry/discussions/36:
@thheller did some experiments with ES module, but unfortunately, binding
(and in general set!
) in cljs.core generates a direct re-assignment of vars the were imported which isn't valid JS...
https://github.com/thheller/shadow-cljs/commit/e8e0cf9973bafcd472ffa1066bbd8ae8c83e5cab
This is also a challenge for cherry itself btw.
BTW just to give a quick update: I worked on it a bit a while back but had to drop it to deal with other things going on (for now, at least). I would like to pick it back up and finish my prototype, but basically what I did was to perform a few operations:
cljs_core.js
, there're a lot of "indirect" assignments, basically something like var x = y; var y = z
which can be simplified to var x = z
. I simplify these cases.var some = function () {...}
, I define it as function some() {...}
. I was also doing this in TypeScript because working with the JS AST is otherwise a huge pain (it still kind of is, but TS makes it much easier to work with and makes me feel more confident because of the complexity of the JS AST).
It was still a bit buggy when I worked on it a few weeks ago, but I was able to kind of get proper dead code elimination working.
@jo-sm Very interesting and hopeful! I'd love to see continued work on this.
When using only one function from cherry/cljs.core.js, e.g.
js-keys
, and then processing this with a treeshaker like esbuild or ncc, it seems that it's still pulling in the entire package.Repro:
index.mjs:
shell:
Output: