johntitus / node-horseman

Run PhantomJS from Node
MIT License
1.45k stars 124 forks source link

Rendering a PDF properly in Horseman / PhantomJS #206

Open zeg-io opened 8 years ago

zeg-io commented 8 years ago

This has more to do with PhantomJS and less Horseman, none-the-less...

My research seems to indicate that PhantomJS does NOT use the print media stylesheet but rather screen. Therefore the suggestion has been made to use a function to replace the media=print with media=all or media=screen

The suggestion is here

I have put together a little test, but the issue is that if the print.css file itself has the @media print query/statement then it still won't work, so I removed that from my print.css and let the link itself be the only definition of media type.

I'm outputting pre and post html files to test the outcome. Here are the relevant sections of the generated pre/post files. Below them is the function used. PRE

<title>Online Reporting</title>
    <link rel="stylesheet" media="all" type="text/css" href="css/main.css">
    <link rel="stylesheet" media="print" type="text/css" href="css/print.css">
    <link rel="stylesheet" type="text/css" href="node_modules/tether-shepherd/dist/css/shepherd-theme-arrows.css">
    <link href="https://fonts.googleapis.com/css?family=Roboto:400,100,300,500,700" rel="stylesheet" type="text/css">

POST

<title>Online Reporting</title>
    <link rel="stylesheet" media="all" type="text/css" href="css/main.css">
    <link rel="stylesheet" media="all" type="text/css" href="css/print.css">
    <link rel="stylesheet" type="text/css" href="node_modules/tether-shepherd/dist/css/shepherd-theme-arrows.css">
    <link href="https://fonts.googleapis.com/css?family=Roboto:400,100,300,500,700" rel="stylesheet" type="text/css">

FUNCTION

function createPDF (url, token, fileOut) {
    var horseman = new Horseman();

    horseman
        .userAgent()
        .userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36')
        .log('pre-open')
        .headers({
            Authorization: 'Bearer ' + token
        })
        .open(url)
        .log('opened url')
        .html(null,'../tmp/pre.html')
        .evaluate(function (selector) {
            $(selector).each(function (idx) {
                $(this).attr('media', 'all');
            });
        }, 'link[media="print"]')
        .html(null,'../tmp/post.html')
        .log('evaluated url')
        .pdf(fileOut, {
            format: 'Letter',
            orientation: 'portrait',
            margin: '0.25in'
        })
        .log('wrote pdf')
        .close();

}

Note: The print style that works so well for actually printing doesn't carry over perfectly to PhantomJS PDF'ing So maybe instead of finding ALL print media the solution might be to give the relevant stylesheet an ID and then change from print to all on only that link... we'll see

zeg-io commented 8 years ago

So the "Issue" I guess is more of a suggestion to add this to the documentation.

zeg-io commented 8 years ago

Ignore this post... this was errant information:

Ok, another strange issue... in order for the above to work you MUST have a before and after .html() function that writes the file to disk. Not sure why that is yet, but I wound up just adding a .then() which deletes the files after .close()

awlayton commented 8 years ago

Wow thanks for looking into this in such detail @zeg-io, especially since I don't have a ton of time for debugging horseman lately.

I could definitely imagine adding something like a .print() action to horseman for rendering to PDF using the print style rather than the screen ones. Of course I would also want to document your findings as to the difference between the two.

Are you sure you need the .html() both before and after changing the attributes? That is interesting...

Could you please post the version(s) of everything you have used in your research? Thanks.

zeg-io commented 8 years ago

Upon closer inspection I had a bunch of moving parts contributing to my confusion. It may have been because of the time they were adding to the process. You will note I added a .wait() this seems to have resolved the issue.

Below is the code I'm currently having luck with. Basically, I'm passing on a JWT which is required, you could kill that for your testing.

Then it looks for the print.css link and changes it from media type print to all.

I found that this was not enough. The PDF renderer seems to only want to allow me 6 inches of width, which the normal print does not. So I had to create a third css file, phantomJS.css

That file is injected into the header prior to rendering. My application defaults to several sections hidden, so I manually "open" the section prior to rendering.

It's obnoxious to debug it, but it seems to at least be working.

var express = require('express'),
    validateAuth = require('../functions/validateAuth'),
    Horseman = require('node-horseman'),
    fs = require('fs');

var router = express.Router(),
    webServerTarget = 'http://zetta.local:5757/index.html';

function createPDF (url, token, fileOut) {
    var horseman = new Horseman();

    horseman
        .userAgent()
        .userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36')
        .log('pre-open')
        .headers({
            Authorization: 'Bearer ' + token
        })
        .open(url)
        .log('opened url')
        .wait(3000)
        .evaluate(function (selector) {
            $(selector).each(function (idx) {
                $(this).attr('media', 'all');
            });
        }, 'link[media="print"]')
        .evaluate(function (selector) {
            $(selector).append('<link rel="stylesheet" media="all" type="text/css" href="css/phantomJS.css" />')
        }, 'head')
        // Open all devices
        .evaluate(function (selector) {
            $(selector).each(function (idx) {
                $(this).find('h2').addClass('up');
                $(this).find('overview').removeClass('ng-hide');
                $(this).find('footer').removeClass('ng-hide');
            });
        }, 'device section')
        .log('evaluated url')
        .pdf(fileOut + '.pdf', {
            format: 'Letter',
            orientation: 'portrait',
            margin: '0.25in'
        })
        .log('wrote pdf')
        .close()
}

router.post('/:date?', function  (req, res) {
    // VALIDATE TOKEN ///////////////////////////////////////////
    var token = req.get('Authorization').replace('Bearer ', '');
    validateAuth(token, function (err, user) {
        if (err) {
            res.status(401).send('invalid token.');
            return;
        } else if (!user) {
            res.status(401).send('invalid token.');
            return;
        } else {
            // TOKEN IS VALID ===================================
            console.log(token);
            var fileOut = '../tmp/' + new Date().getTime().toString().replace(':',"-").replace(' ', '-');
            createPDF(webServerTarget, token, fileOut);
            res.status(200).json({
                fileName: fileOut
            });
        }
    });
    // END VALIDATE TOKEN ///////////////////////////////////////
});
awlayton commented 8 years ago

It makes sense now that the .wait() works and you only need one before.

When you use AngularJS or similar for the page, the .open() can return before the framework is done changing the DOM. This is because PhantomJS can't know when the JavaScript from AngularJS is done.

As for the CSS stuff, what exactly did you have to do in the phantomJS.css file? Full disclosure, I am not very good with CSS.