marcbachmann / node-html-pdf

This repo isn't maintained anymore as phantomjs got dreprecated a long time ago. Please migrate to headless chrome/puppeteer.
MIT License
3.56k stars 545 forks source link

Under concurrent load, child process exit event fires before IO is complete, causing errors #644

Open kvasserman opened 3 years ago

kvasserman commented 3 years ago

Under heavy load (in the context of web application, for example using nestjs) some of the requests fail:

  1. According to child process documentation (https://nodejs.org/docs/latest-v14.x/api/child_process.html#child_process_event_exit), exit event can fire before stdio is closed. In such cases respond() is being called with code 0 (success), no error and no data. This results in an error in the calling code, because filename is expected but is not returned. I added additional error checking to respond() to handle this case.

  2. Given the above, it's probably not a good idea to subscribe to exit event at all. I commented out the on('exit'..) code. on('close') should be sufficient.

Tested it under some load (30 concurrent connections for 2 minutes) - no errors, 5000+ PDFs rendered.

kvasserman commented 3 years ago

I should add that without this change, under a concurrent load, somewhere in the range of 10-20% of requests for PDF are failing.

barakplasma commented 2 years ago

@kvasserman we have also been noticing failures to get the file supposedly made by phantomjs under concurrent load. I have a suspicion that the issue fixed by this PR might be the same bug causing #643 . @marcbachmann would it be possible to merge both #644 and #643 and to create a release (even if beta)?

P.S. Thanks for the package Marc 💟 ! And thanks kvasserman for the investigation / bug fix here too! 🚀

barakplasma commented 2 years ago

I used patch-package to patch html-pdf@3.0.1 for the project I'm working on.

This patch is meant to let others use the fix in #644 until it is merged and a new version released. This patch will probably solve #643 as well.

To install it, follow the installation instructions at https://github.com/ds300/patch-package and copy the diff below to your patches/ folder

Here is the diff that solved my problem (all credit to @kvasserman ) :

diff --git a/node_modules/html-pdf/lib/pdf.js b/node_modules/html-pdf/lib/pdf.js
index 46f2ce4..764bf80 100644
--- a/node_modules/html-pdf/lib/pdf.js
+++ b/node_modules/html-pdf/lib/pdf.js
@@ -123,12 +123,15 @@ PDF.prototype.exec = function PdfExec (callback) {
     // Ignore if code has a value of 0 since that means PhantomJS has executed and exited successfully.
     // Also, as per your script and standards, having a code value of 1 means one can always assume that
     // an error occurred.
-    if (((typeof code !== 'undefined' && code !== null) && code !== 0) || err) {
+    if (((typeof code !== 'undefined' && code !== null) && code !== 0) || err || (code === 0 && !data)) {
       var error = null

       if (err) {
         // Rudimentary checking if err is an instance of the Error class
         error = err instanceof Error ? err : new Error(err)
+      } else if (code === 0 && !data) {
+        // This is to catch the edge case of having a exit code value of 0 but having no data (exit can be called before io pipes are closed)
+        error = new Error('html-pdf: Process exited successfully, but no data received')
       } else {
         // This is to catch the edge case of having a exit code value of 1 but having no error
         error = new Error('html-pdf: Unknown Error')
@@ -150,7 +153,7 @@ PDF.prototype.exec = function PdfExec (callback) {

   // An exit event is most likely an error because we didn't get any data at this point
   child.on('close', respond)
-  child.on('exit', respond)
+  // child.on('exit', respond)

   var config = JSON.stringify({html: this.html, options: this.options})
   child.stdin.write(config + '\n', 'utf8')

This issue body was partially generated by patch-package.

barakplasma commented 2 years ago

@kvasserman I added a test for your PR branch change here https://github.com/kvasserman/node-html-pdf/pull/1 If that test is run on the master branch of this repo, you can recreate the failure

DorianW commented 1 year ago

@barakplasma awesome patch, helped me to reduce the error rate on Heroku + html-pdf to almost 0%!