Closed bjrmatos closed 9 years ago
Looks good :)
Suggested tweaks:
waitForRender: true
(wait for the page to render), and afterPageLoad: function () {}
(execute a custom function against the page after the page has loaded, waiting for render if waitForRender
is true)PHANTOM_HTML_TO_PDF_DONE
to PHANTOM_HTML_TO_PDF_READY
(we're ready to convert the page, but we're done with converting it yet)PHANTOM_HTML_TO_PDF_READY
a function. It will cause an error when the page isn't being converted. We should still be able to inject and execute a custom function at the end of the page without it being called within the page's pre-existing JS. Instead make it a variable and wait for it to return true.afterPageLoad
or add to a separate pull request so we can add the waitForRender functionality sooner :)As you've identified, this solution relies on access to the target page code. You might want to also support:
window.renderComplete
rather than window.PHANTOM_HTML_TO_PDF_READY
. This isn't particularly valuable though as you'd likely know the code and have access to edit it to be aware of such a variable.For myself, and probably the majority of users, simply supporting the ability to set PHANTOM_HTML_TO_PDF_READY
within the page would meet my needs perfectly. So I'd actually rather see any further customization parked and this feature integrated ASAP :)
How quickly could we get this integrated and added to the npm package?
You might also want to add a test that the page times out rather than renders if window.PHANTOM_HTML_TO_PDF_READY
is never set to true
- Split out the 2 pieces of functionality: waitForRender: true (wait for the page to render), and afterPageLoad: function () {} (execute a custom function against the page after the page has loaded, waiting for render if waitForRender is true)
you mean adding this two features? waiting for the page to render/load is something different that waiting for page's javascript execution, but if it is something related with the name of the configuration (waitForRender vs waitForJS) i prefer waitForJS since this feature is related to javascript execution and want to be explicit on that
- Rename PHANTOM_HTML_TO_PDF_DONE to PHANTOM_HTML_TO_PDF_READY (we're ready to convert the page, but we're done with converting it yet)
i'm ok with that :)
- Don't make PHANTOM_HTML_TO_PDF_READY a function.
i prefer a function here, because it is more explicit and allows other possibilities (as explained before)
It will cause an error when the page isn't being converted.
what do you mean with that? if someone adds waitForJS: true
in config is because it has total control of the page (it's not an external page), and must invoke the callback (PHANTOM_HTML_TO_PDF_DONE), if the user don't do that, the process will fail because the timeout.. other than that i don't see a scenario when the process fails (we can improve the timeout error message when waitForJS: true
)
Customizable variables (so I can choose to use window.renderComplete rather than window.PHANTOM_HTML_TO_PDF_READY.
i was thinking the same.. lets wait for the author's feedback before do anything
Searching for elements to check for page load. e.g. checking if an element with a given ID exists, or an element with a given css class containing certain text ... which leads to supporting a function and allowing the user to customize what they want (like my pull request..). But would love to see this done without an eval :)
that functionality has other semantics than waiting for page's javascript execution and deserves it's own PR.. it is more related to waitForRender
, which is different than waitForJS, waitForJS is designed for the user can explicitly indicate that the page and its custom javascript has ended.
after thinking this, instead of support the eval based implementation in waitForJS
, we should support something like waitForRender
(related to page load, waiting for the precense of element, checking if an element with a given ID exists, etc) which has other semantics than waitForJS
and deserves its own configuration, and should be discussed in another issue/PR
Happy to use an alternative name if waitForRender
is confusing. The reason I chose it, is because we're waiting for the PDF to be ready to produce. For us, the rendering of the report includes the JS making changes (e.g. applying translations). If the JS hasn't yet finished, we would say it hasn't finished rendering. But simply waitFor / waitForJS fine (could use waitFor
and then accept boolean, string or function...)
Adding a function call to a function on your page that doesn't exist will cause an error: Uncaught TypeError: window.PHANTOM_HTML_TO_PDF_DONE is not a function
. This error won't occur if we access the page using phantom-html-to-pdf, but will occur if we simply open the page in (e.g.) Chrome. Which would be an issue for us.
The second issue I can imagine, is how do you make sure that our phantom-html-to-pdf script has inserted the function before it is called?
hey @pofider i think i found a bug in phantom-workers (the callback for phantom.execute
is called two times and in different order, probably a race condition..) when trying to implement a timeout test for waitForJS
.
i have published the code in https://github.com/bjrmatos/phantom-html-to-pdf/tree/timeout-test-waitForJS, just run npm test
multiple times to reproduce the errors
some screenshoots:
as you see, the callback is called two times and the errors come in different order
node: 0.10.40 os: OSX
The second issue I can imagine, is how do you make sure that our phantom-html-to-pdf script has inserted the function before it is called?
because i have used:
page.onInitialized = function() {
// inject function to the page in order to the client can instruct the ending of its JS
if (body.waitForJS) {
page.evaluate(function() {
window.PHANTOM_HTML_TO_PDF_DONE = function() {
var scriptNode = document.createElement("script");
scriptNode.src = 'http://intruct-javascript-ending';
document.body.appendChild(scriptNode);
};
});
}
};
the script will be always defined in page before any user's code, see http://phantomjs.org/api/webpage/handler/on-initialized.html (specifically this part: This callback is invoked after the web page is created but before a URL is loaded.
)
Adding a function call to a function on your page that doesn't exist will cause an error: Uncaught TypeError: window.PHANTOM_HTML_TO_PDF_DONE is not a function. This error won't occur if we access the page using phantom-html-to-pdf, but will occur if we simply open the page in (e.g.) Chrome.
yeah, i know that, but if your intention is to use your page for other reasons than generating PDF you can do:
// or typeof window.PHANTOM_HTML_TO_PDF_DONE !== "function"
if (typeof window.PHANTOM_HTML_TO_PDF_DONE !== "undefined") {
window.PHANTOM_HTML_TO_PDF_DONE();
}
fair enough. I'm now happy with this for my use-case :)
A polling-based solution:
This looks good as it meets my specific use-case, doesn't use eval, and may have a performance benefit (though I'm unclear on whether this would be a non-trivial improvement)
My preference is to get either implemented in the codebase ASAP. Thanks for your efforts @bjrmatos :)
Exactly! A eval/polling based solution is the best for external pages. Thnks to you for being patient
@pofider i have found the solution for the bug.. will submit a PR in phantom-workers tomorrow (now it is too late in my country)
rename PHANTOM_HTML_TO_PDF_DONE to _READY
Yes
function vs variable for PHANTOM_HTML_TO_PDF_READY
Did you also consider using an object property setter? It won't fail when the property is not defined and still allows you to set a custom parameter through it without polling.
page.onInitialized = function() {
if (body.waitForJS) {
page.evaluate(function() {
Object.defineProperty(window, 'PHANTOM_HTML_TO_PDF_READY', {
set: function(val) {
if (!val)
return;
var scriptNode = document.createElement("script");
scriptNode.src = 'http://intruct-javascript-ending';
document.body.appendChild(scriptNode);
}
});
});
}
};
custom name for PHANTOM_HTML_TO_PDF_READY
Yes, why not.
The approach suggested by @bjrmatos is really great and I will happily merge it and release it when it is done. Afterwards we can take a look at the polling and eval based implementation for the external html use case.
Big thanks to both of you.
wonderful news! i will finish the implementation in the next hours (after work..), the object property setter is a great solution! :tada:
done! i want to add a timeout test but phantom-workers has a little bug.. i will submit the patch in phantom-workers and come back for the timeout test
ok PR submitted, when a new version of phantom-workers (when pofider/phantom-workers#8 is resolved) is published on npm i will update the phantom-workers version in dependencies and add the timeout test.. all will be ready to merge
Good job!
If the update is available on npm by Friday, I'll start using it against our site then. Else I'm on holiday next week and will use it when I return :)
we're ready! :)
Published. Thanks.
this implementation doesn't invoke the PDF rendering process until the phantom web page invoke the callback
PHANTOM_HTML_TO_PDF_DONE
, this function communicates with the phantom process (via a fake request) and let the rendering process to start.this implementation is more concerned about the jsreport use case (or scenarios where the user has control of the html -> local html) .. i prefer this implementation because it let the user invoke a method, this method could be used to pass some data to the phantom process (ej:
PHANTOM_HTML_TO_PDF_DONE({ customData: 'xxxxx' });
) and do something.. i don't know a use case for it but we could find something in the future :), also it is more elegant (function invocation instead of global condition in the webpage) and with more performance (instead of evaluate the page every x times checking for some condition to meet)i will add the new parameter to the docs if we decide that this should be merged.
also, i think we should support the approach that #14 uses (eval based).. it is more useful when converting external pages (http://google.com), waiting for some condition to meet is the best way to do it in external pages.
right now, my implementation (callback based) uses a boolean configuration (waitForJS: true), in order to support the other implementation (eval based) i think we can accept a string (waitForJS: 'return window.pageLoaded === true;') and based on it resolve with the corresponding implementation (callback/eval based)