stdlib-js / stdlib

✨ Standard library for JavaScript and Node.js. ✨
https://stdlib.io
Apache License 2.0
4.44k stars 482 forks source link

stdlib-js/esm package update #1059

Open walterra opened 1 year ago

walterra commented 1 year ago

Are there any plans to update the stdlib-js/esm repo/package? The repo (https://github.com/stdlib-js/esm) and npm package (https://www.npmjs.com/package/@stdlib/esm) are at v0.0.3 and use some older code in ndarray relying on eval() which blocks us from using it.

We're evaluating using stdlibjs and it looks really promising. Unfortunately the standard package sizes are quite large for client side code. Some experiments with the esm variant and rollup showed there's quite some potential to bring the bundle sizes down using that approach. If stdlib-js/esm was up to date and published to npm we could make direct use of it without the need of a fork and/or custom bundling.

Is that something we could help out/contribute potentially?

stdlib-bot commented 1 year ago

:wave: Hi there! :wave:

And thank you for opening your first issue! We will get back to you shortly. :runner: :dash:

kgryte commented 1 year ago

Hey, @walterra! Thanks for reaching out! To help us recommend the best approach, how much of stdlib are you wanting to use? Many packages or a few, and what is your intended use case?

walterra commented 1 year ago

Hey, thanks for getting back so quickly! For now we are looking into using ndarray and chi2test. If we can make it to bring down bundle sizes and make the library consumable via esm we might pick up more in the future.

I noticed that each individual package has an esm branch but unfortunately those are not published to npm but as the docs mention intended to be consumed via script tags and will fetch internal dependencies on demand, so it looks like they are not bundles containing all the code for a package.

We're evaluating if we're able to pick up the library to be used in parts of Kibana (https://github.com/elastic/kibana).

kgryte commented 1 year ago

Thanks for the info! Let me discuss internally with the team and circle back!

kgryte commented 1 year ago

@walterra To help provide us with a bit more context, how are you currently handling CommonJS packages in your production workflow? Are you wanting esm variants because that's what your tooling caters to, or are there specific issues you encountered with CommonJS stdlib standalone packages?

walterra commented 1 year ago

We can handle CommonJS packages and got it working with stdlibjs, the problem was that just adding @stdlib/ndarray and @stdlib/stats-chi2test as is to our package.json added optimized 500KB+ to our client side bundle sizes which unfortunately is not feasible for us.

We then had a go at your custom bundler which resulted in a 1.3MB package for those 2 libraries. Visualized here with source-map-explorer:

image

Next we tried with a webpack setup which was able to bring that down to ~173KB. I think the following is just for the stats-chi2test as an entry point but you can see in the treemap that it includes ndarray too:

image

Then we found your esm repo and using rollup with the esm version allowed us to bring down the chi2test package to ~125KB:

image

We saw that you seem to make use of rollup in the esm branch too.

If you'd be able to pull off releasing individual minified/optimized packages like this on npm that would certainly hit a sweet spot! But I'm not sure it's necessary. If the current esm repo was updated and published to npm we could use that and do deep imports from within our code with the same effect. If I'm not mistaken that's also lodash's approach: Individual npm packages for CommonJS and a single esm package. Not sure if hybrid package releases are an option at this stage, I'm a bit out of the loop regarding the current ecosystem 😅 .

Let me know if you need more information or if we can be of any help.

kgryte commented 1 year ago

Thanks for the detailed write-up!

kgryte commented 1 year ago

@walterra Did you try just using the ndarray/* packages that you needed (e.g., @stdlib/ndarray/array or @stdlib/ndarray/ctor)? Loading @stdlib/ndarray loads the entire namespace, which, without knowing your use case, seems likely not necessary.

kgryte commented 1 year ago

Ah, okay. I see from the source map explorer images that you don't seem to be loading the @stdlib/ndarray namespace.

walterra commented 1 year ago

I did some more testing with bundling. The initial bundle increases (1MB+) were part of a quite large PR that included a lot of other code additions. I created some dummy draft PRs that just add stats-chi2test and run the example code from the README to isolate the increases.

@stdlib/stats-chi2tests

# package.json
"@stdlib/stats-chi2test": "^0.0.8",

# import
import chi2test from '@stdlib/stats-chi2test';

This adds 596.1KB to our bundle size. (https://github.com/elastic/kibana/pull/162294)

@stdlib/esm

# package.json
"@stdlib/esm": "^0.0.3",

# import
import chi2test from '@stdlib/esm/stats/chi2test';

This adds 120.9KB to our bundle size. (https://github.com/elastic/kibana/pull/162295)

This is in line with a custom bundle I created using rollup which ends up with ~128KB. (https://github.com/walterra/stdlibjs-esm/pull/1/files)

So, for us it looks acceptable with the esm package regarding the bundle size. The only problem with that is that this repo/npm package is based off older code which uses eval() which we must not use. In more recent releases this has been refactored away.

Edit/Addendum: One thing we can't tell yet from the testing: Is there a chance that code was refactored in newer versions in a way that's causing this increase in bundle size?

kgryte commented 1 year ago

@walterra Yes, the code has been refactored, particularly among dependencies, such as ndarray.

Just as a heads up, we are working on this.

A few comments:

  1. The reason for the 1.3MB bundle using the stdlib bundler is largely due to no minification. I reproduced the bundle locally and inspected its contents. Without minification, all license headers and internal source code documentation is present. This causes the bundle to balloon to 1.3MB. We will update the README and other docs to demonstrate how to generate a "production" bundle for vendoring using stdlib tooling.
  2. By using the browserify plugin browser-pack-flat, you can achieve much the same things as rollup and co and using ES Modules. I was able to generate a bundle for @stdlib/stats/chi2test which is around 137 KB (including a Buffer browser polyfill).
  3. We have some thoughts on how to reduce the bundle size a fair amount further, but we need a couple days to make this happen.

To reproduce the build, you can do the following:

  1. Clone the stdlib repository.

  2. Navigate to the cloned directory.

  3. Install development dependencies.

    $ make install-node-modules
  4. Make a small modification to ./lib/node_modules/@stdlib/_tools/bundle/pkg-list/lib/main.js starting a L124

    bopts = {
      'externalRequireName': opts.requireName,
      'require': pkgs,
      'paths': ( opts.paths ) ? opts.paths : browserifyPaths(),
      'debug': true
    }

    add debug: true in order to generate an inline source map.

  5. Create an output directory.

    $ mkdir ./tmp
  6. Generate a "flat" bundle.

    $ NODE_PATH=./lib/node_modules \
        node ./lib/node_modules/@stdlib/_tools/bundle/pkg-list/bin/cli @stdlib/stats/chi2test \
        --namespace none \
        --plugin browser-pack-flat \
        | ./node_modules/.bin/exorcist ./tmp/bundle.js.map > ./tmp/bundle.js
  7. To support vendoring, add an export statement so that the package can be consumed by downstream packages.

    $ sed 's/\/\/# sourceMappingURL=/module.exports=require("@stdlib\/stats\/chi2test");\n\/\/# sourceMappingURL=/' ./tmp/bundle.js > ./tmp/bundle.js.tmp
    $ rm -f ./tmp/bundle.js
    $ mv ./tmp/bundle.js.tmp ./tmp/bundle.js
  8. Generate a minified bundle.

    $ ./node_modules/.bin/uglifyjs -c -m \
        --in-source-map ./tmp/bundle.js.map \
        --source-map "content='./tmp/bundle.js.map',url='bundle.min.js.map'" \
        -o ./tmp/bundle.min.js \
        ./tmp/bundle.js
  9. Inspect the bundle with source-map-explorer.

    $ ./node_modules/.bin/source-map-explorer ./tmp/bundle.min.js \
        --exclude-source-map \
        --no-border-checks

This should generate something like the following:

Screenshot 2023-07-19 at 7 46 08 PM

Note, as you may already be aware, there are some nuances when attempting to bundle an already bundled file, but at a minimum you can verify that the above bundle works in Node.js by adding the following script to the ./tmp folder containing generated bundles

import chi2test from './bundle.min.js';

var x = [
    [ 20, 30 ],
    [ 30, 20 ]
];

var out = chi2test( x );
console.log( out.print() );

and running via the command-line.

kgryte commented 1 year ago

In short, I think there is a path forward here which doesn't require updating the esm repository as a shorter term workaround.

"tree-shaking" is not really necessary when using individual stdlib packages, by design, as functionality is localized to individual packages and consumed on an as-needed basis. And the use of ES Modules doesn't really have much of an impact on bundle sizes because of how the codebase is organized (to create an ES Module from a source file is largely a matter of rewriting require statements to import statements).

What does matter is how bundling is done. In particular, making sure license headers are minified, etc. Hopefully, the above sequence demonstrates that.

Longer term, we do need to provide an ES Module offering in order to accommodate bundlers such as rollup which have poor support for CommonJS. We've thus far avoided moving in this direction, as dual packages are still a headache to maintain in Node.js-land, we maintain strong backward compatibility guarantees, and custom loaders are still experimental.

walterra commented 1 year ago

Thanks for being on top of this so quickly, this looks promising! Completely agree that the bundle size isn't really related to whether it's CommonJS or esm, it was just what's brought us further in our experiments quicker without knowing too much about stdlib's whole setup. Looking forward to see further improvements, thanks!

kgryte commented 1 year ago

Quick update: did some refactoring over the weekend, and we were able to reduce the bundle size a bit further (~102KB).

As a consequence of refactoring, there should also be a noticeable performance boost.

Below is an updated source map based on a bundle generated using the sequence documented above.

Screenshot 2023-07-23 at 11 16 06 PM
Planeshifter commented 1 year ago

@walterra We have released a new patch release (v0.0.9) for the standalone stats-chi2test package, which incorporates the refactoring.

walterra commented 1 year ago

Thank you so much for the update, we'll try to pick this up!

kgryte commented 1 year ago

@walterra Let us know if there are other stdlib packages (and/or feature requests) that we can help prioritize for your use case.

walterra commented 1 year ago

Thanks for the ping! So here's where we're at now with the different packages we evaluated on our builds:

package version size PR Notes
@stdlib/stats-chi2test 0.0.8 +596.1KB #162294
@stdlib/stats-chi2test 0.0.9 +391.6KB #162925
custom bundle @stdlib/esm 0.0.3 +135.8KB #162298 old version using eval()
@stdlib/esm 0.0.3 +120.9KB #162295 old version using eval()

Version 0.0.9 is a great improvement but unfortunately still quite large compared to the earlier experiments with the esm package.

Here's our yarn.lock file after we just add "@stdlib/stats-chi2test": "^0.0.9", to our package.json: https://github.com/elastic/kibana/pull/162925/files#diff-51e4f558fae534656963876761c95b83b6ef5da5103c4adef6768219ed76c2de

I know too little about stdlibjs to be able to tell if all these dependencies are really necessary to get chi2test to work or if its direct dependencies may need the same treatment you gave the package itself to reduce the bundle size.

We're a bit caught up with other stuff for the moment, but will continue to experiment with stdlibjs and it would be great if we can pick it up at some point!