phetsims / chipper

Tools for developing and building PhET interactive simulations.
MIT License
11 stars 14 forks source link

Add gzip compressed _all.html file to build artifacts #746

Closed mattpen closed 5 years ago

mattpen commented 5 years ago

Requested to improve performance for the iOS app in phetsims/phet-ios-app#440.

I'm trying to find a node library that will accomplish lzma compression so we can build this artifact in buildRunnable.html. I'm currently evaluating lzma-native. It works fine on linux machines but fails hard (no catchable exceptions) on Windows machines. I've opened https://github.com/addaleax/lzma-native/issues/78 with the developer.

mattpen commented 5 years ago

This seems to be working nicely now:

$ grunt build-for-server --brands=phet --allHTML
Running "build" task
Building runnable repository (acid-base-solutions, brands: phet)
Building brand: phet
>> require.js optimization for brand: phet complete (4328192 bytes)
(node:18676) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
>> Minification for phet complete
>> Require.js: 1432412 bytes
>> Preloads: 241309 bytes
>> Mipmaps: 132656 bytes

Done.

$ cd build/phet/
$ ls -la
total 10140
drwxr-xr-x 1 matt 197121       0 Mar 14 14:12 ./
drwxr-xr-x 1 matt 197121       0 Mar 14 14:11 ../
-rw-r--r-- 1 matt 197121    1111 Mar 14 14:12 acid-base-solutions_all_iframe_phet.html
-rw-r--r-- 1 matt 197121 2140359 Mar 14 14:12 acid-base-solutions_all_phet.html
-rw-r--r-- 1 matt 197121  245803 Mar 14 14:12 acid-base-solutions_all_phet.html.xz
-rw-r--r-- 1 matt 197121 5833512 Mar 14 14:12 acid-base-solutions_all_phet_debug.html
-rw-r--r-- 1 matt 197121    1110 Mar 14 14:12 acid-base-solutions_en_iframe_phet.html
-rw-r--r-- 1 matt 197121 1877995 Mar 14 14:12 acid-base-solutions_en_phet.html
-rw-r--r-- 1 matt 197121    9899 Mar 14 14:12 acid-base-solutions-128.png
-rw-r--r-- 1 matt 197121   95847 Mar 14 14:12 acid-base-solutions-600.png
-rw-r--r-- 1 matt 197121   40140 Mar 14 14:12 acid-base-solutions-ios.png
-rw-r--r-- 1 matt 197121  105678 Mar 14 14:12 acid-base-solutions-twitter-card.png
-rw-r--r-- 1 matt 197121    2358 Mar 14 14:12 dependencies.json
drwxr-xr-x 1 matt 197121       0 Mar 14 14:11 xhtml/
samreid commented 5 years ago

Have you considered using node's built-in zlib library? https://nodejs.org/docs/latest-v11.x/api/zlib.html

mattpen commented 5 years ago

@samreid - I don't believe that supports lzma, which was requested specifically in phetsims/phet-ios-app#440. The compression ratio is somewhat better.

$ xz acid-base-solutions_all_phet.html -c > acid-base-solutions_all_phet.html.xz
$ gzip acid-base-solutions_all_phet.html -c > acid-base-solutions_all_phet.html.gz
$ du -sh acid-base-solutions_all_phet.html*
2.1M    acid-base-solutions_all_phet.html
704K    acid-base-solutions_all_phet.html.gz
604K    acid-base-solutions_all_phet.html.xz

There are a couple of other libraries with xz bindings that we could try if this is creating problems for local development, or we could make native calls to xz. The latter option probably won't work for windows devs, so we would probably need to move that step to the build-server.

mattpen commented 5 years ago

I'm going to investigate https://www.npmjs.com/package/xz

mattpen commented 5 years ago

That doesn't seem to be stable either. I'm getting the same error @chrisklus reported in slack when tyring to install it without removing node_modules first.

> xz@2.0.0 install C:\Users\matt\phet\git\chipper\node_modules\xz
> node-gyp rebuild

C:\Users\matt\phet\git\chipper\node_modules\xz>if not defined npm_config_node_gyp (node "C:\Users\matt\AppData\Roaming\npm\node_modules\npm\bin\node-gyp-bin\\..\..\node_modules\node-gyp\bin\node-gyp.js" rebuild )  else (node "" rebuild )
gyp ERR! configure error
gyp ERR! stack Error: Can't find Python executable "C:\Program Files\Python 3.5\python.EXE", you can set the PYTHON env variable.
gyp ERR! stack     at PythonFinder.failNoPython (C:\Users\matt\AppData\Roaming\npm\node_modules\npm\node_modules\node-gyp\lib\configure.js:483:19)
gyp ERR! stack     at PythonFinder.<anonymous> (C:\Users\matt\AppData\Roaming\npm\node_modules\npm\node_modules\node-gyp\lib\configure.js:508:16)
gyp ERR! stack     at C:\Users\matt\AppData\Roaming\npm\node_modules\npm\node_modules\graceful-fs\polyfills.js:284:29
gyp ERR! stack     at FSReqWrap.oncomplete (fs.js:153:21)
gyp ERR! System Windows_NT 10.0.17134
gyp ERR! command "C:\\Program Files\\nodejs\\node.exe" "C:\\Users\\matt\\AppData\\Roaming\\npm\\node_modules\\npm\\node_modules\\node-gyp\\bin\\node-gyp.js" "rebuild"
gyp ERR! cwd C:\Users\matt\phet\git\chipper\node_modules\xz
gyp ERR! node -v v10.15.3
gyp ERR! node-gyp -v v3.6.2
gyp ERR! not ok
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.7 (node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@1.2.7: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! xz@2.0.0 install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the xz@2.0.0 install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
mattpen commented 5 years ago

Since none of the available node library bindings appear to be compatible with npm prune && npm update, I'm going to remove this step from chipper and add it to the build-server. I'm going to implement it as a native process call.

samreid commented 5 years ago

The compression ratio is somewhat better. 704K acid-base-solutions_all_phet.html.gz 604K acid-base-solutions_all_phet.html.xz

Is the argument that it is important to save 100kb per simulation? If so, why is that important? Let's say we have 100 simulations, saving an average of 100kb / simulation. That is a total of 10MB. Does that cross a threshold?

I'm going to remove this step from chipper and add it to the build-server.

Is it not important that this be controlled as an output from the build step?

I want to understand the reasoning to do something "outside the box" when there is a supported cross-platform solution provided directly by the node.js SDK and doesn't require 3rd party libraries or limitation of running native process call which only works on some platforms.

mattpen commented 5 years ago

Is the argument that it is important to save 100kb per simulation? If so, why is that important? Let's say we have 100 simulations, saving an average of 100kb / simulation. That is a total of 10MB. Does that cross a threshold?

This request is from Chris Huxtable, who is working on our app. I'm not confident it crosses a threshold or that gzip will be unsuitable. gzip is already provided by apache so it would require 0 effort in terms of build tools development.

10MB savings isn't crossing a threshold, but it's a 15% reduction which could be significant at scale.

Is it not important that this be controlled as an output from the build step?

The thing that concerns me the most is that it creates a dependency on the file naming conventions, but there are many other instances of this type of dependency in the build-server already.

I'll follow up with Chris and see if he is ok with gzip and bring it up at dev meeting if not.

mattpen commented 5 years ago

Chris Huxtable's reasoning for requesting an xz/lzma asset is here.

Here are our options for proceeding:

  1. Use gzip and accept the 15% bloat in storage size.

    • This affects server bandwidth and storage on all 100k+ ios devices. This is already implemented server side, we would just need to change which compression algorithm is used in iOS (zlib is supported, which should be able to read gzip files).
  2. Use a node library to create an xz asset with the grunt tools.

    • There is some instability with the npm prune && npm update operation, which would need to be sorted out. We would likely need to revert to the rm -rf node_modules && npm install method, which is quite a bit slower and requires a lot of extra bandwidth during the build process. There may be other unforeseen problems in adding an additional third party npm library to the build tools.
  3. Use a system call to create an xz asset with the grunt tools.

    • All developers would need to install xz in order to use the grunt tools.
  4. Use a system call to create an xz asset in the build-server.

    • This somewhat breaks the design pattern of using grunt to create the build artifacts, and puts an additional dependency on grunts file structure and naming conventions to deploy the artifact. (Note: we already have a dependencies like this)
  5. Adding a path in the tomcat application that fetches the sims and compresses them on the fly

    • very suboptimal in terms of cpu usage on the server
  6. Implementing the xz compression algorithm in node ourselves

    • This is basically reinventing the wheel
  7. Implementing an httpd module to add support for xz compression.

    • This is reinventing the tools needed to reinvent the wheel, and then reinventing the wheel
  8. rewrite chipper in c

    • seems legit
samreid commented 5 years ago

Unlike gzip, lzma seems to benefit from compressing multiple simulations together. For instance, I compressed 10 simulations together with lzma and they averaged 300kb/sim in the compression. I don't recall seeing the same batch savings with gzip.

We could also investigate reducing the size of the simulations themselves. For instance, jquery takes up about 84kb (uncompressed) of each of our simulations. There is an issue to remove it https://github.com/phetsims/joist/issues/537 There may be other ways to attain savings.

mattpen commented 5 years ago

Unlike gzip, lzma seems to benefit from compressing multiple simulations together.

I think this is not going to be beneficial for our situation. One of the requirements is that we download a single compressed sim and write it directly to disk without processing. They are then decompressed on demand. If we bundle them, at download time the app would need to decompress the bundle, then compress the sims individually and then write them to disk. This requires a lot of cpu processing, which severely impacts performance of the app and excessively drains the battery.

mattpen commented 5 years ago

We could also investigate reducing the size of the simulations themselves. For instance, jquery takes up about 84kb (uncompressed) of each of our simulations. There is an issue to remove it phetsims/joist#537 There may be other ways to attain savings.

This would certainly be beneficial, but lzma would still get better compression ratios than gzip.

samreid commented 5 years ago

Notes from today's meeting:

Some comments from the team: SR - Does android support lzma? I’m not too worried about 100kb extra per sim. It also sounds like lzma takes more CPU and time, given that I'd prefer gzip JG - Don't know enough to cast an informed vote DB - Not enough know how on this matter. Defer to MP MB - GZip ok JO: Built-in gzip with Apache serverside and iOS/Android client-side sounds like enough MP - +1 for gzip MK - the build in gzip seems really nice. CK - Would like more info but gzip seems okay≈ CM ON JB - Uninformed gut instinct: Gzip is more common and easy to support, and the difference isn’t that much, so go for that.

mattpen commented 5 years ago

@mbarlow12 - Can you please investigate XZ for Java, either the original package or the Apache commons wrapper?

mattpen commented 5 years ago

I'll spend some more time investigating the npm issues for now.

@ariel-phet - Do you have any input on how we should proceed? To summarize, the benefit is a fairly significant savings in terms of bandwidth and storage on mobile devices. The drawback is that we already have a pretty good compression algorithm working (gzip) and this would require a significant amount of developer time.

mattpen commented 5 years ago

I tested this package again this morning. It requires sh and is not compatible with windows and it looks like the developer isn't planning to add it: https://github.com/robey/node-xz/issues/8

samreid commented 5 years ago

Have you reviewed https://github.com/LZMA-JS/LZMA-JS ?

mattpen commented 5 years ago

On windows, this package will work but requires all build machines to globally install node-pre-gyp and rimraf otherwise npm prune fails.

I tested this package on MacOS and RHEL as well and was able to npm install, prune, and update without error. However, it looks like npm prune is printing warnings to stderr, and this is causing the build server to think there has been an error. The build-server simply checks for any content in stderr after calling npm update and decides to abort the build if content is present. Maybe we should check stderr content for the (case-insensitive) presence of 'err'?

mattpen commented 5 years ago

This is a pure javascript implementation with zero dependencies so npm operations should work beautifully. Install, prune, and update all work fine on Windows/MinGW, MacOS, and RHEL. Bad news, as far as I can tell it is broken. I tried various compression levels and the resulting artifact ends up 200kB LARGER than the original artifact.

$ du -sh build/phet/*_all_phet.html*
2.0M    build/phet/acid-base-solutions_all_phet.html
2.2M    build/phet/acid-base-solutions_all_phet.html.xz
mattpen commented 5 years ago

No readme published on www.npmjs.com. Has not been updated in 3 years. Does not compile on MacOS.

samreid commented 5 years ago

I tried various compression levels and the resulting artifact ends up 200kB LARGER than the original artifact.

Maybe it was running decompression instead of compression somehow?

mattpen commented 5 years ago

Maybe it was running decompression instead of compression somehow?

I tried calling lzma.compress( contents ) and lzma.compress( contents, 6 ).

mattpen commented 5 years ago

This package seems to be compatible with npm operations and is building the asset at the expected size on windows, macos, and rhel. I have not tried an RC build yet, but this looks pretty promising. This package seems a bit odd, because it can't do manipulation on buffers directly. It's interaction is entirely on the filesystem using some precompiled 7zip binaries. This should be fine as long as grunt.file.write is synchronous, but I did see some odd behavior with lzma-native that indicates it may not be.

mattpen commented 5 years ago

@jbphet ran an rc from master and the xz artifact seems to have been created successfully. I confirmed that the file unpacks with unxz into a useable file. I think this is good to go, the next step would be a batch maintenance release.

[mape5853@bayes phet]$ du -sh isotopes-and-atomic-mass_all_phet.html*
2.1M    isotopes-and-atomic-mass_all_phet.html
484K    isotopes-and-atomic-mass_all_phet.html.xz
mbarlow12 commented 5 years ago

FWIW It appears that there's pretty solid support in Java for compression/decompression for both .xz and .7z, though I'm pretty sure the Android app will only use the latter.

mattpen commented 5 years ago

@mbarlow12 - We don't have any plans to make a .7z asset, do you mean the app will only use the former?

mattpen commented 5 years ago

@ariel-phet - I think I've got this sorted out now.

samreid commented 5 years ago

In https://github.com/phetsims/chipper/issues/749#issuecomment-476312688, @jbphet and I identified that this code:

    _7z.cmd( [ 'a', '-txz', allHTMLFilename + '.xz', allHTMLFilename ], err => {
      grunt.log.error( err );
      throw new Error( err );
    } );

is interfering with the jimp step. We do not understand why, but commenting out that _7z.cmd call allows jimp to proceed. Increasing priority and marking blocking for phet-io, since we cannot build phet-io sims with this interference.

UPDATE: We also determined this error occurs when running grunt --allHTML --brands=phet, so this is probably failing for the app builds.

mattpen commented 5 years ago

I added in a check to prevent the error null from being thrown.

samreid commented 5 years ago

Not intending to move us backwards or cause us to second guess ourselves, but I was somewhat surprised when I reviewed the comments in https://github.com/phetsims/chipper/issues/746#issuecomment-475370322 and then the subsequent work in the issue.

Summary from https://github.com/phetsims/chipper/issues/746#issuecomment-475370322 Prefer or recommend gzip: 7 Prefer or recommend lzma: 0 Abstained: 4

Maybe there was some subsequent discussion that was not captured in this issue? Also, I'm not convinced that "7zip-min" will have good long-term support. At the moment it only has 3 stars on GitHub (one from me).

mattpen commented 5 years ago

I was somewhat surprised when I reviewed the comments in #746 (comment) and then the subsequent work in the issue.

I decided to continue investigation because there were 4 comments of "I don't know enough". I remember that we decided I should check a few other npm packages and check Android before ruling it out because its a fairly significant improvement in terms of bandwidth and storage on small devices. It appears that Android should be able to support it without a problem. Node is still an open question and 7zip-min is the only package I found to be cross-platform compatible.

Also, I'm not convinced that "7zip-min" will have good long-term support. At the moment it only has 3 stars on GitHub (one from me).

I agree that it seems under supported. It also seems like a fairly simple package, its just a lightweight wrapper of the 7zip CLI.

mattpen commented 5 years ago

In https://github.com/phetsims/phet-ios-app/issues/440 we decided to switch to gzip.

mattpen commented 5 years ago

I'm going to add this to the build process so that we can provide a hash of the file as well, I can't find any reasonable way to create a checksum of a response in httpd. This should be much safer than using 7zip because node has a built in zlib package.

zepumph commented 5 years ago

@samreid does this still block phet-io publication?

samreid commented 5 years ago

The blocking part was resolved, thanks!

mattpen commented 5 years ago

@zepumph - @KatieWoe reports that QA is done. There were a few issues discovered but they are not related to this release. I think we are ready to deploy to production.

mattpen commented 5 years ago

This change was applied to all current release branches and deployed to production where necessary.