silverstripe / cwp-pdfexport

Adds PDF export capability for CWP pages
BSD 3-Clause "New" or "Revised" License
4 stars 5 forks source link

Export fails on initial request with "ContentNotFoundError" from wkhtmltopdf #2

Open robbieaverill opened 6 years ago

robbieaverill commented 6 years ago

From #1:

Noted issue: clicking the "Export to PDF" link for a page initially fails with this message:

Fatal error: wkhtmltopdf failed: Exit with code 1 due to network error: ContentNotFoundError in /webroot/vendor/cwp/cwp-pdfexport/src/Extensions/PdfExportControllerExtension.php on line 164

Subsequent requests succeed and open the PDF.

I suspect this could be a macOS issue with piping the wkhtmltopdf output to stdout. We could write it to the PHP temp dir instead of piping to stdout.


To reproduce:

You can clear the PDF generated on initial load via vendor/bin/sake dev/tasks/CleanupGeneratedPdfBuildTask which will allow you to continue to reproduce the initial PDF export page load with the error, while running this task afterwards each time.

robbieaverill commented 6 years ago

Ok, it's not the piping to stdout that's the issue but looks like an error in wkhtmltopdf caused by having certain kinds of content in the HTML. We're not the only ones! See https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2051. Updated title to reflect this change.

robbieaverill commented 6 years ago

The cause is one or more of the following:

  1. a page has a URL without a protocol in it, e.g. <script src="//jquery.cdn.com/whatever.js"> - very common
  2. a page has a URL with a query string in the URL, e.g. <script src="http://mysite.com/javascript.js?m=1u983243"> - SS framework adds this to resources by default
  3. a page has a URL which is 404

We can "fix" the first two points with string manipulation, but we can't do anything about broken links in the page. It looks like a combination of one or more of those things causes the PDF generation to fail with an error message on the first attempt. It still generates a PDF but there's no guarantee that it would look the same as you'd expect it to.

I don't think we can fix this problem. There's 70 others on this GitHub issue complaining about the same problem, and it hasn't been acknowledged or fixed in 3 years so I don't expect it will be. https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2051

Our options:

  1. ignore the errors and use whatever it DOES generate. note this in the documentation (quick fix)
  2. use a different PDF generation library (more work)
  3. remove this functionality altogether and tell people to use their browsers to export PDFs (longer term this is the best solution)
robbieaverill commented 6 years ago

Further - regression tested against CWP 1.x:

Since it affects the CWP 1.x equivalent code as well, I'm going to reduce the impact slightly and treat it as non-blocking for release. This affects script and stylesheet references that don't exist, but we should be able to expect a lower number of these, since they're generally critical for a website to function. 404s in content would be more concerning, but that's not a problem.

robbieaverill commented 6 years ago

For clarity, the affects/v3 label refers to the same functionality in the cwp/cwp module