guigrpa / docx-templates

Template-based docx report creation
MIT License
883 stars 145 forks source link

Performance issue: Freezing and long processing time when passing large amount of data #317

Closed R-404 closed 1 year ago

R-404 commented 1 year ago

I am encountering a performance issue when attempting to generate a report using the CreateReport function in our React application.More Specifically, when passing a large amount of data to the function, the react app freezes for some time and takes excessively long time to generate a word file.

The data I need to pass to the template is of object type with length that ranges from 100 to 2000

I've come across similar issues here like https://github.com/guigrpa/docx-templates/issues/153 and https://github.com/guigrpa/docx-templates/issues/81 and tried out the nosandbox fix, it seems to reduce time a bit but I have some additional logic inside the template file for translation and stuff which then doesn't work with the nosandbox set to true.

I have an test barebone example here I have also added a similar template in the sandbox files public folder you can also check that out

Steps to test and Reproduce:

Workaround:

Additional Details:

React version : 17.0.2 docx-templates version: 4.9.2 Dataset size used for testing: object with length upto 2000 OS: ubuntu 22.04 Hardware specifications: i7 8th gen, 16gb RAM, 256 SSD

Please let me know if you have any possible fix or even workaround I could try to fix this, also if you require any additional info just let me know. Thanks

jjhbw commented 1 year ago

Hi, thanks for your extensively documented issue.

Note that each command is invoked using the below code. If you have noSandbox set to true, each command is executed by eval(), which is a lot faster than it being executed by vm.Script (make sure you understand the security implications, see README).

https://github.com/guigrpa/docx-templates/blob/a9a629853a52f7be3588d527e21a55617a12e7ad/src/jsSandbox.ts#L43-L61

You noted that you have already tried noSandbox: true and it didn't help enough. That makes sense; if your dataset is very large, executing code from within the template is still significantly slower due to all kinds of overheads involved.

One thing to keep in mind is that JS is single threaded and works with an event loop. https://www.digitalocean.com/community/tutorials/node-js-architecture-single-threaded-event-loop In practice, this means that running an expensive computation within the 'main loop' causes all other interactivity of the application to freeze until the computation is complete. This is a horrible user experience for web applications.

One thing you should definitely try is to run createReport in a web worker. You can also try to take as much logic out of the template, as running code inside of the template can be slow (especially loops).

R-404 commented 1 year ago

Hi @jjhbw, Thank you for your time and quick response. I have implemented your suggestion by adding a web workers with the createReport function. This has significantly improved the performance, allowing us to use the application in the background during the report generation process.

I wanted to look into possibilities to improve the speed of the report generation process. Therefore, I have created this issue to discuss and investigate potential fixes/solutions. I will look a bit more into using noSandbox and its implications on security.

Once again, thank you for your assistance.

jjhbw commented 1 year ago

Good to hear that! If you want to spar some more on performance considerations, just @ me in this issue.