guigrpa / docx-templates

Template-based docx report creation
MIT License
873 stars 144 forks source link

Make docx-template work from browser #8

Closed vdechef closed 6 years ago

vdechef commented 6 years ago

I just discovered docx-templates, and it does exactly what I need. But now I would like to use it on frontend side. Would it be feasible to have it work from browser ? I guess that the 2 main differences would be to read/write the file (eg. providing a buffer instead of path for input/output), and the VM part. If you think this is technically possible, I am willing to work on it.

guigrpa commented 6 years ago

That would really be a good idea. What do you mean by VM?

vdechef commented 6 years ago

I mean the jsSandbox part. I am not sure that it can run as-is in browser (but I may be mistaken). I will give it a shot today.

vdechef commented 6 years ago

Seems like I will need some advice to setup the build environment. I have never worked with flow, and I am not sure of the way you compile/deploy/debug the lib.

guigrpa commented 6 years ago

Sorry about not having a CONTRIBUTING.md :)

Install all dependencies by running yarn or npm install. Then yarn compileWatch or npm run compileWatch to generate compiled files and keep them up to date.

You can also run node examples/swapi/swapi.js to get a feeling of running the application from the Node environment (although you'll be working in the browser env).

You probably don't need to worry much about flow. If you're not sure, just don't include any further types. We'll figure that out later on.

guigrpa commented 6 years ago

Oh, and thanks a lot for your interest!

guigrpa commented 6 years ago

Any other questions, feel free to ask!

vdechef commented 6 years ago

Happy new year, and back to work ;) I was able to build and run your code. I have modified it to load in browser, and to work on streams rather than files (I use browserify to bundle all dependencies, and JSZip to unzip/zip file content).

Now my problem is with template processing :

Error: Error executing command: END-FOR film
items is undefined

The code in processTemplate.js is a bit cryptic :p so I am struggling. Do you have any hint about the way to debug this ?

guigrpa commented 6 years ago

Happy new year!

First of all, when you run this example in Node, is it successful? You should update the server address to http://swapi.apis.guru, as per https://github.com/graphql/swapi-graphql/pull/128. This has been updated in master.

In order to obtain more logs in Node, you can run the example like this (Mac):

DEBUG_DOCX_TEMPLATES=1 node swapi.js swapi-simple.docx

(Windows): set DEBUG_DOCX_TEMPLATES=1 && node swapi.js swapi-simple.docx

guigrpa commented 6 years ago

In any case, my guess is that processTemplate should be pretty much platform-agnostic. I imagine that the external interfaces (receiving the zip file, composing all files for the output, then zipping everything up) should be the most complex part for what you're trying to do.

vdechef commented 6 years ago

Saddly it seems that processTemplate is not platform-agnostic : I added some logs at the begining of produceJsReport, to check the function's inputs, and they are exactly the same for both browser and Node versions.

Here is the way I checked the inputs :

console.log('data:', md5(JSON.stringify(data, null, 2)))
console.log('temp:', md5(util.inspect(template, { showHidden: true, depth: null })))
console.log('opts:', md5(JSON.stringify(options, null, 2)))

I will investigate a bit more to find the place in the code where data diverges.

vdechef commented 6 years ago

So the problem was in jsSandbox.js : in runUserJsAndGetRaw() you set result using const result = sandbox.__result__; but in browser sandbox is not updated (a reference problem maybe ?), so I needed to use current context const result = context.__result__;

Do you see any problem with this ?

vdechef commented 6 years ago

Ok, I have a first version working at https://github.com/vdechef/docx-templates , if you want to have a look at it. Next step is packaging, and reducing bundle.js size (500kB when minified ! Duh ! ). Oh, and also adding some tests.

guigrpa commented 6 years ago

Great news! Right now it's a holiday in Spain, but I'll have a look in a few days!

vdechef commented 6 years ago

I did some improvements in packaging:

I still have some worries with the bundle size. I can reduce it to 341kB by removing debug.js dependencies (need your advice on this), but I did not find yet another optimisation.

guigrpa commented 6 years ago

On disabling debug.js: you could modify this line so that it also considers process.env.NODE_ENV:

const log: any = process.env.NODE_ENV === 'production' && DEBUG 
  ? require('./debug').mainStory
  : null;

This way, running webpack in production mode should not include storyboard.

But in any case, I don't think storyboard is the main culprit for the large bundle size. Maybe you could investigate using webpack-bundle-analyzer?

vdechef commented 6 years ago

Regarding debug.js, I cannot rely on this kind of test, because browserify bundles all the dependencies in 1 big js file. So as long as the code requires a module, this module will be injected in the final bundle. So I just added -u lib/debug.js in browserify command, to forcefuly exclude it.

Regarding the final bundle size, I gave a look at Webpack Bundle Analyzer, but the setup was not matching mine, so I ended up using Disc: http://hughsk.io/disc (very easy to use with browserify). The 2 main culprits for bundle size are JSZip (because it drags pako/zlib with it), and Storyboard. Here are the bundle sizes I get:

So I think we will have to go with a 350kB js library. Once gziped, it is still ~90kB, but I have no idea what to remove to improve this.

guigrpa commented 6 years ago

I finally got to checking out your implementation; thanks for putting it together! Is this something you plan to wrap in a PR to docx-templates? If so, here are some comments:

  1. Consider different entry points in the package.json for Node (main) and for the browser version (browser).
  2. Some dependencies have been removed from the package.json (archiver, fstream, globby, unzip, uuid), and some implementations as well (zip.js, for instance). Why not keep both those for the browser version and for the node version?
  3. It really takes a long time to run even with simple templates. Have you had the chance to profile it with Chrome’s devtools to see where lies the main culprit?
  4. Regarding lib size, I’ve managed to reproduce your 150 kB reduction with browserify by removing debug.js. However, running discify without removing storyboard I see that it only takes 45 kB. On the other hand, the bundle apparently includes the whole lodash library (why?), which is probably overkill. Maybe we could open a new issue for that, now that we plan to support browser-side execution and size becomes an issue.
  5. I also did some tests with webpack. It uses a tree shaking technique, so I expected a better bundle size. It ended up with 310 kB uncompressed, which might not be so bad (~100kB). Furthermore, I would not use any bundler. Let the user bundle the library however he wants…
  6. Make sure you use prettier before creating a PR (I noticed some style discrepancies in zip.js)

Thanks in any case!

vdechef commented 6 years ago

Thanks for your review! Actually I have been hoping to make a PR from the begining. So here are my remarks:

  1. OK. I already have 2 entry points in package.json (index.js for node, and browser.js for index-browser.js). I can rename index-browser.js to browser.js

  2. I chose to have a common implementation for both browser and node (except for the filesystem part which is not compatible with browsers). So I use directly main-buff.js for browser, and I added a wrapper (main-fs.js) for node that calls main-buff.js. As main-buff.js works with streams, I chose to use JSZip for compression/decompression, and so many dependencies became useless (the zip is never decompressed on the disk).

  3. I did not investigate performances yet. It seems to me that node version is fast enought. The performances problem in browser occurs when the file is complex. I will keep this for later.

  4. I guess that loadash is injected by Browserify. If we remove bundler (as you suggested in 5) it may be skipped.

  5. I think you are right about letting the users bundle the lib as they see fit. I will modify the compile phase to remove browserify, and run some tests in my app with Webpack. I will also add a yarn target to run browser example (with browserify this time).

  6. OK, I will do it.

I just realized today that I left a bug in jsSandbox.js : I did not export context's variables to sandbox. So the variables created when using EXEC are not available in other blocks. Strangely it works in Node, but not in browser : there is definitly a difference in the way the runtimes handle references when copying objects ...

vdechef commented 6 years ago

I fixed the points from my previous comment, and created a PR.

I also profiled the performance problem, as you suggested. The culprit is the sandbox : vm.runInContext() is 100 times slower in browser compared to Node. I searched for alternatives, but could not find good ones (jsvm seemed promising, but it is also bugged).

vdechef commented 6 years ago

After additional tests, I could not find a good alternative to Node's vm module. This is really sad, because docx-template performances in browser are terrible as soon as you put a few loops in the template. And the memory consumption is hideous too!

I did some tests with the sandbox removed (using eval directly) and the code is running ~50x faster. But there is a security concern: the template has to be provided by a trusted source.

Do you have an idea about a way to handle this ? Should we search for another vm engine, or limit what can be done in template to avoid injection ? To be honest, I am a bit short of ideas ...

guigrpa commented 6 years ago

OMG, 50x faster. Well, maybe it should be a config option: by default, template JS should run on a sandbox, but if the library user wants (e.g. he knows users will provide their own templates, so they will run locally at the clients' browsers) he can disable this security feature. What do you think?

vdechef commented 6 years ago

This may be the best solution. I will add the switch, and update the documentation

guigrpa commented 6 years ago

Thanks a lot for the hard work! I've already integrated the PR. I will do some cleaning up before releasing, but otherwise it seems great. I'm glad to ironed out all the kinks (images, zip in/out, etc.) while putting all this together.

guigrpa commented 6 years ago

Shipped in v2.3.1. If anything goes wrong, please report!