guigrpa / docx-templates

Template-based docx report creation
MIT License
882 stars 145 forks source link

Added maximumWalkingDepth option for supporting large documents #360

Closed BrandonDR closed 5 months ago

BrandonDR commented 5 months ago

This PR implements a new option maximumWalkingDepth. This option replaces the hard-coded 1000000 (1 million) max loop iterations. This loop counting is intended to avoid an infinite loop. I have retained the default behaviour but updated the InternalError message to 'infinite loop or massive dataset detected'. This should make this issue easier to debug.

I have added TS doc comments and updated the README to document this option. I have also added a Tip: to suggest the use of the Infinity constant which will disable the infinite loop detection. I intend to use this in combination with a timeout on my node process instead of an arbitrary limit.

Motivation

When generating a document using a FOR loop with 2000 iterations and each loop inserting (INS) many values, I was able to exceed 1 million loops in the walkTemplate function. This caused a generic 'something went wrong with the document. Please review and try again' error

I appreciate your work on this library!

BrandonDR commented 5 months ago
Build, test and code coverage seems to pass (expand to see output) ```sh $ yarn install && yarn compile && yarn test yarn install v1.22.22 [1/5] Validating package.json... [2/5] Resolving packages... success Already up-to-date. $ yarn compile yarn run v1.22.22 $ rimraf ./lib && tsc && yarn rollup $ rollup -c ./src/browser.ts → ./lib/browser.js... (!) Circular dependencies node_modules/jszip/lib/utils.js -> node_modules/jszip/lib/base64.js -> node_modules/jszip/lib/utils.js node_modules/jszip/lib/utils.js -> node_modules/jszip/lib/base64.js -> /home/brandon/code/docx-templates/node_modules/jszip/lib/utils.js?commonjs-proxy -> node_modules/jszip/lib/utils.js (!) Use of eval is strongly discouraged https://rollupjs.org/guide/en/#avoiding-eval node_modules/vm-browserify/index.js 108: 109: Script.prototype.runInThisContext = function () { 110: return eval(this.code); // maybe... ^ 111: }; created ./lib/browser.js in 1s ./lib/index.d.ts → ./lib/bundled.d.ts... created ./lib/bundled.d.ts in 118ms Done in 5.38s. Done in 5.74s. yarn run v1.22.22 $ rimraf ./lib && tsc && yarn rollup $ rollup -c ./src/browser.ts → ./lib/browser.js... (!) Circular dependencies node_modules/jszip/lib/utils.js -> node_modules/jszip/lib/base64.js -> node_modules/jszip/lib/utils.js node_modules/jszip/lib/utils.js -> node_modules/jszip/lib/base64.js -> /home/brandon/code/docx-templates/node_modules/jszip/lib/utils.js?commonjs-proxy -> node_modules/jszip/lib/utils.js (!) Use of eval is strongly discouraged https://rollupjs.org/guide/en/#avoiding-eval node_modules/vm-browserify/index.js 108: 109: Script.prototype.runInThisContext = function () { 110: return eval(this.code); // maybe... ^ 111: }; created ./lib/browser.js in 1.1s ./lib/index.d.ts → ./lib/bundled.d.ts... created ./lib/bundled.d.ts in 101ms Done in 5.67s. yarn run v1.22.22 $ yarn lint && yarn testCovFull $ eslint "src/**/*.{js,jsx,ts,tsx}" $ yarn testCovPrepare && yarn testDev && yarn testCovReport $ rm -rf ./coverage .nyc_output .nyc_tmp && mkdir .nyc_tmp $ NODE_ENV=development jest --coverage && mv .nyc_output/coverage-final.json .nyc_tmp/coverage-dev.json && rm -rf .nyc_output PASS src/__tests__/unit.test.ts PASS src/__tests__/list_commands.test.ts PASS src/__tests__/error_handling.test.ts (7.893 s) PASS src/__tests__/images.test.ts (10.594 s) PASS src/__tests__/templating.test.ts (11.422 s) -----------------------|---------|----------|---------|---------|------------------------------------------------------------ File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s -----------------------|---------|----------|---------|---------|------------------------------------------------------------ All files | 94.12 | 86.81 | 95.72 | 95.11 | browser.ts | 0 | 100 | 100 | 0 | 1 errors.ts | 93.54 | 52.38 | 89.47 | 90 | 41-43,71 index.ts | 100 | 100 | 0 | 100 | jsSandbox.ts | 90 | 88.88 | 100 | 91.17 | 45-47 main.ts | 93.93 | 90.27 | 100 | 94.11 | 57,241,387,430,433,459,508-526 preprocessTemplate.ts | 100 | 100 | 100 | 100 | processTemplate.ts | 93.06 | 86.66 | 100 | 95.06 | 88-99,255-259,431,488,500,616,642,717,744,961,974,976,1093 reportUtils.ts | 94.44 | 80.95 | 100 | 96.82 | 36,42 types.ts | 100 | 100 | 100 | 100 | xml.ts | 98.59 | 90 | 91.66 | 98.46 | 45 zip.ts | 100 | 100 | 100 | 100 | -----------------------|---------|----------|---------|---------|------------------------------------------------------------ Test Suites: 5 passed, 5 total Tests: 190 passed, 190 total Snapshots: 164 passed, 164 total Time: 13.355 s, estimated 15 s Ran all test suites. $ cp -r .nyc_tmp .nyc_output && nyc report --reporter=html --reporter=lcov --reporter=text -----------------------|---------|----------|---------|---------|------------------------------------------------------------ File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s -----------------------|---------|----------|---------|---------|------------------------------------------------------------ All files | 94.12 | 86.81 | 95.72 | 95.11 | browser.ts | 0 | 100 | 100 | 0 | 1 errors.ts | 93.54 | 52.38 | 89.47 | 90 | 41-43,71 index.ts | 100 | 100 | 0 | 100 | jsSandbox.ts | 90 | 88.88 | 100 | 91.17 | 45-47 main.ts | 93.93 | 90.27 | 100 | 94.11 | 57,241,387,430,433,459,508-526 preprocessTemplate.ts | 100 | 100 | 100 | 100 | processTemplate.ts | 93.06 | 86.66 | 100 | 95.06 | 88-99,255-259,431,488,500,616,642,717,744,961,974,976,1093 reportUtils.ts | 94.44 | 80.95 | 100 | 96.82 | 36,42 types.ts | 100 | 100 | 100 | 100 | xml.ts | 98.59 | 90 | 91.66 | 98.46 | 45 zip.ts | 100 | 100 | 100 | 100 | -----------------------|---------|----------|---------|---------|------------------------------------------------------------ Done in 18.25s. ```
BrandonDR commented 5 months ago

For context, this seems to related to the fix for #340, specifically this commit https://github.com/dseiler/docx-templates/commit/8eeec1fa0227813b1787ef0e8d87edf3a94fcf29

jjhbw commented 5 months ago

Hi @BrandonDR , great idea to add this. The hardcoded limit seems a bit low, now I think of it. I can totally imagine legit use cases hitting that limit, so making it configurable makes sense to me. What is the limit value are you using in your code? Maybe we should increase the default value too, in addition to making it configurable.

BrandonDR commented 5 months ago

Hi @jjhbw, I am using the Infinity constant, which disables the exception but this would not be a good default if there could still be an infinite loop.

const buffer = await createReport({
    //...
    maximumWalkingDepth: Infinity,
    //...
});

I am opting for a 10 minute process timeout instead. As I am spawning a node process from my PHP queue worker and handling the exception case there.

Our use case has a bunch of varied templates that we are building all the time for a range of clients, so we need something to accommodate a wide range of document sizes and templates. For the case we encountered this error there was a FOR loop with a 20 column table with ~2400 rows which managed to hit this limit. This would be an extreme outlier.

Hard to say, maybe 10 million might be reasonable headroom. There is a trade off of how quickly the library should give up. Maybe a timeoutSeconds option with a setTimeout approach might more sense?

jjhbw commented 5 months ago

Excellent, thanks a lot for contributing!

dseiler commented 4 months ago

@BrandonDR thanks for this update. Yes, it makes sense to have this limit configurable. I set is initially to 1 Million because with 1 Million it takes roughly 10 seconds to hit the limit. In my documents I usually never get close to this limit (even with quite complex docs with many loops etc.). But of course there could be use cases where you need more (however a docx with more than 2000 rows might become unhandy ;-) ). My biggest concern was the memory consumption. If it exceeds the 1 Mio loops (which is usually an indication that something went out of control), the memory consumption on our cloud infrastructure might get critical.

dseiler commented 4 months ago

@BrandonDR quick addition: if the users can create the templates themselves they could always crash the server with a statement like this:

{{! counts = [0];}}
{{ FOR count IN counts }}
{{! counts.push(-1);}}
{{ $count }}
{{ END-FOR count }}

Besides that there could be still some very rare situations in a template where the engine goes into an infinite loop. So in production I would always set this option to a decent value to prevent crashes.

jjhbw commented 4 months ago

if the users can create the templates themselves they could always crash the server with a statement like this:

Just a pedantic side note: be aware of the security considerations when letting users upload templates that are executed on your servers. You are effectively executing user-provided javascript in that case. See https://github.com/guigrpa/docx-templates?tab=readme-ov-file#performance--security

BrandonDR commented 4 months ago

@jjhbw @dseiler Good notes, I am running this from separate 'queue workers' so memory usage is not a huge concern in regards to crashing the whole application. It is reasonably well isolated but I should isolate it further regarding security - it should be on a dedicated vm for untrusted user uploads with minimal permissions/config/env vars etc. I have thought of this prior. But of course there are deadlines to meet. Thanks for the reminder