spajak / cef-pdf

cef-pdf HTML to PDF utility
MIT License
79 stars 32 forks source link

images with large base64 sources seem to cause a memory error #5

Closed ghost closed 7 years ago

ghost commented 7 years ago

When uploading an HTML string that contains an image with a large base64 encoded source, cef-pdf --server crashes with exit code 139. This branch https://github.com/efx/cef-pdf/blob/topic-memory-bug/Dockerfile has the debian stable release Dockerfile that shows the Linux environment I am testing with. I am compiling with http://opensource.spotify.com/cefbuilds/cef_binary_3.3112.1652.g8c8deea_linux64.tar.bz2.

reproduction

I wrote a script in node.js to upload. I tried getting curl to work but was unable to:

  1. start the cef-pdf in server mode cef-pdf --server --port=8000
  2. clone https://github.com/efx/bug-test.git. It assumes you have node.js installed. If you don't you can quickly install to your home directory on a *nix like system with https://github.com/creationix/nvm/blob/master/README.md#install-script
  3. cd bug-test && npm install && npm start

I tested this against a build I compiled from master on MacOS and saw the CPU max out and 100% for the cef-pdf process until I closed it.

ghost commented 7 years ago

I have confirmed that using the --file option running the same binary does not cause the problem: test.pdf

spajak commented 7 years ago

Thanks for the report. I think I know where is the problem, I will be rewriting HTTP request parsing soon

ghost commented 7 years ago

You're welcome. Grand, thanks for the update!

spajak commented 7 years ago

Changes are in devel branch now. You can test it if you want

ghost commented 7 years ago

Thanks for the update @spajak . I tried it using the same command above, but the server version gives me a PDF that only contains the text 10000. It works with the file version. output-1502392450323.pdf

spajak commented 7 years ago

It's beacuse of this Transfer-Encoding: chunked. cef-pdf does not support chunked encoding yet

ghost commented 7 years ago

I see. Do you have an example of the POST request you test with? or one using curl? I tried to get something to work with curl but was unable to.

spajak commented 7 years ago

I use Postman (Chrome app), but maybe i should switch to curl

ghost commented 7 years ago

ah, thanks. Interestingly, when I use postman and select the 'binary' option, the html file with the large base 64ed image works. But I cannot find what HTTP headers / body format postman uses to send the file. They have a show curl option, but it does not contain the switches for the actual file part. I like curl because the example is quite explicit and I can model it in whatever other request clients are using. If I come across a working example with curl I can PR for the README.

screen shot 2017-08-11 at 9 49 36 am
spajak commented 7 years ago

Select raw and just paste html

ghost commented 7 years ago

Nice, the above works well I just am testing the upload programmatically and wanted to know the underlying format for POSTing the file. SImple cURL example works too, I just haven't found how to attach the file to the request in a way that the cef-pdf server recognizes.

# returns a 400, tried various combinations of -F too
 curl -v --data @alginit.html http://127.0.0.1:8000/demo.pdf > tmp/test.pdf
# works
curl -v --data '<h1>hello 2</h1>' http://127.0.0.1:8000/demo.pdf > tmp/test.pdf

Either way don't let me sidetrack you here! The base64 fix is something I do not need to rely upon at the moment.

spajak commented 7 years ago

In case of big file, curl first sends headers only and expects 100 Continue response status, then it sends body in the second request. cef-pdf does not support this yet. It will be next think to work with along with transfer encoding chunked

spajak commented 7 years ago

Added support for 100 Continue HTTP status code. Changes in devel branch (commit 09d4e111eae06322b00e57ae3576fbccb125143d)

ghost commented 7 years ago

I verified this works in addition to the curl shorthand for uploading a file: curl -d @alginit.html http://localhost:8000/test.pdf > tmp/latest.pdf. Presuming cef-pdf is running as a server at port 8000, and you've cloned the above repo or create alginit.html in your working directory.