wkhtmltopdf / wkhtmltopdf

Convert HTML to PDF using Webkit (QtWebKit)
https://wkhtmltopdf.org
GNU Lesser General Public License v3.0
14k stars 1.83k forks source link

wkhtmltoimage fails with a URL that wkhtmltopdf can handle #1765

Closed elcapo closed 10 years ago

elcapo commented 10 years ago

What steps will reproduce the problem?

  1. Try to generate a PDF with:

    wkhtmltopdf URL test.pdf

  2. Try to generate a JPG from the same URL with:

    wkhtmltoimage URL test.jpg

In both cases, the URL is: http://www.elmundo.es/elmundo/2011/06/24/navegante/1308937468.html

What is the expected output?

PDF snapshot is generated correctly ut the JPG one is not.The expected result would be a JPG containing a snapshot of the desired website and named test.jpg.

What do you see instead?

Instead, we obtain a segmentation error and an image that contains errors.

QPainter::restore: Unbalanced save/restore
QPainter::setRenderHint: Painter must be active to set rendering hints
QPainter::setRenderHint: Painter must be active to set rendering hints
QPainter::combinedTransform: Painter not active
QPainter::pen: Painter not active
QPainter::setPen: Painter not active
QPainter::setBrush: Painter not active
...
QPainter::pen: Painter not active
QPainter::setPen: Painter not active
QPainter::pen: Painter not active
QPainter::setPen: Painter not active
Segmentation fault

What version are you using?

wkhtmltopdf --version

wkhtmltopdf 0.12.1-9615f00 (with patched qt)

wkhtmltoimage --version

wkhtmltoimage 0.12.1-9615f00 (with patched qt)

On what operating system?

Debian Wheezy x86_64 GNU/Linux

ashkulz commented 10 years ago

Thanks for the extremely detailed bug report! Can you check if it is reproducible with 0.12.0, to ensure that it is not a recent regression due to changes in the build system which have happened recently?

ashkulz commented 10 years ago

Also, did you encounter any issues when installing wkhtmltopdf as a package? This is the first bug report after the changes for native package generation was implemented :smile:

elcapo commented 10 years ago

I've just tried with the 0.12.0 and the output was different:

1308937468.html elmundo.jpg
Loading page (1/2)
QSslSocket: cannot resolve SSLv2_client_method               ] 10%
QSslSocket: cannot resolve SSLv2_server_method
Warning: Failed to load http://estaticos03.elmundo.es/social/ugc/avatars/003/673/medium-comments_3673737.png (ignore)
Warning: Failed to load http://estaticos03.elmundo.es/social/ugc/avatars/003/673/medium-comments_3673737.png (ignore)
Rendering (2/2)                                                    
Done

But it also generated a corrupt image. As it happened with the 0.12.1 version, the PDF was able to generate a readable file. But its output did also generate a few error messages:

Loading pages (1/6)
QSslSocket: cannot resolve SSLv2_client_method               ] 10%
QSslSocket: cannot resolve SSLv2_server_method
Warning: Failed to load http://estaticos03.elmundo.es/social/ugc/avatars/003/673/medium-comments_3673737.png (ignore)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done                                                                      
Exit with code 1 due to http error: 1005 

It must be some particularity of this website because with different URL's the binaries work properly. I'm now trying, with a downloaded copy of this page, to generate images of its different parts, just to see if I'm able to identify the component that is breaking the image generation.

I'll let you know if I find something.

elcapo commented 10 years ago

Ok, I've just isolated the problem. I still don't have a patch but it looks like a JavaScript was making the snapshot to crash. In particular, the page (http://www.elmundo.es/elmundo/2011/06/24/navegante/1308937468.html) has a JavaScript that handles the "cookie" message (line 48):

<script type="text/javascript" language="javascript"
    src="http://estaticos.cookies.unidadeditorial.es/js/policy.js"></script>
</div>

Removing this tag makes the image to be generated properly:

Loading page (1/2)
Warning: Failed to load file:///.../cx_002.js (ignore)
Warning: Failed to load file://pagead2.googlesyndication.com/pagead/js/lidar.js (ignore)
Rendering (2/2)                                                    
Done

Now I'll try to guess what's happening into that script and it it's possible to solve it.

elcapo commented 10 years ago

Ok. It looks like this JavaScript was generating a component with a width=100% where the maximum width wasn't being specified. The line is:

.bloque-cookies {
  background: #555;
  color:#fff;
  padding:2em 1em 1.5em;
  text-align: center; width:100%;
  clear:both;
  float:left;
  font-size: 80%;} \

Specifying a fixed output, can solve the problem:

wkhtmltoimage --disable-smart-width --width 1024 URL test.jpg
elcapo commented 10 years ago

(I'm closing the issue, as long as it looks like there is no need to fix any code).

ashkulz commented 10 years ago

Thanks!