spajak / cef-pdf

cef-pdf HTML to PDF utility
MIT License
77 stars 33 forks source link

Wait for condition before printing #6

Open beckyconning opened 6 years ago

beckyconning commented 6 years ago

One approach would be to add a command line option which takes a javascript expression which returns a boolean. This expression would be repeatedly evaluated until it returns true at which point the PDF would be generated.

I'd love to help make this a thing and will be investigating how to do it now. Any help or guidance through the source would be very appreciated.

beckyconning commented 6 years ago

https://github.com/spajak/cef-pdf/blob/master/src/Job/Manager.cpp#L54

beckyconning commented 6 years ago

http://magpcss.org/ceforum/apidocs/projects/(default)/CefFrame.html#ExecuteJavaScript(constCefString&,constCefString&,int)

beckyconning commented 6 years ago

https://code.google.com/archive/p/chromiumembedded/issues/344

beckyconning commented 6 years ago

http://magpcss.org/ceforum/apidocs3/projects/(default)/cef_message_router.h.html

beckyconning commented 6 years ago

So I'm now taking the approach of adding command line and http request options which cause cef-pdf to produce the pdf when it receives the "pdf" ceqQuery request rather than immediately.

When these options are used the frame in question will need to evaluate window.ceqQuery({request: 'pdf'}); at which point the pdf would be produced.

beckyconning commented 6 years ago

Trying to work this out (I don't have C++ experience)

https://github.com/spajak/cef-pdf/compare/master...beckyconning:Remote-trigger

Is where I am up to right now but when run with --trigger flag it just stalls. Will carry on with this another time : ).

spajak commented 6 years ago

I just can find the reason behind that feature. What do you need it for?

beckyconning commented 6 years ago

To produce PDFs from DOMs produced by javascript applications which use data from http requests of arbitrary duration (could be 0.5 seconds, could be 30 minutes).

Although this could be achieved by transforming that DOM into static HTML (canvas -> img etc) and then passing a data uri or making a post request with this HTML in the body this can not be done non interactively (e.g. on a schedule) without yet another automated browser.

This is akin to a selfie taken by the subject via remote trigger rather than a photo which is taken by someone else as soon as the subject is seen (loaded).

beckyconning commented 6 years ago

This allows javascript applications to tell cef-pdf when they are ready to be rendered as a PDF.

spajak commented 6 years ago

cef-pdf is meant to print static html documents, not applications. Maybe I will rethink this in the future.

beckyconning commented 6 years ago

cef-pdf works perfectly with html snapshots from javascript applications but as I say that doesn't work without another browser to produce that snapshot. This is why I'm adding this feature. Obviously you aren't obligated to merge the PR when its ready but I think as an optional feature its very useful. If you don't merge it I will continue to maintain a fork.

Any javascript application can include a print layout via a @media print {} CSS query and then pass its url to cef-pdf over http with the remote trigger tag, when its ready to be rendered as a pdf it pulls that trigger and a pdf version of the information presented by the application will be delivered.

Think of all the Javascript applications out there which would benefit from this : ).

beckyconning commented 6 years ago

This is now working, just need to add the http option argument and tidy up. https://github.com/spajak/cef-pdf/compare/master...beckyconning:trigger-remote

beckyconning commented 6 years ago

Then I will submit a PR.

spajak commented 6 years ago

@beckyconning I have made some adjustment to your code 4d7ee11336aa7384a61ca689d9b02a21dc41370b but i'm unable to make the trigger work (under Windows). Can you test this on devel branch? Maybe I missed something important, because my OnQuery method is never executed

beckyconning commented 6 years ago

http://magpcss.org/ceforum/apidocs3/projects/(default)/CefRenderProcessHandler.html#OnRenderThreadCreated(CefRefPtr)

http://magpcss.org/ceforum/apidocs3/projects/(default)/CefClient.html#OnProcessMessageReceived(CefRefPtr,CefProcessId,CefRefPtr)

Both CefClient and CefRenderProcessHandler have OnProcessMessageReceived. OnProcessMessageReceived is overwritten in Client. I believe they need to be separate instances.

beckyconning commented 6 years ago

@spajak does the above solve this issue?

spajak commented 6 years ago

But I have CefClient and CefRenderProcessHandler separated. I've just merged CefRequestHandler into CefClient

beckyconning commented 6 years ago

Try separating them.

Sent from my iPhone

On 1 Sep 2017, at 18:55, Sebastian Pająk notifications@github.com wrote:

But I have CefClient and CefRenderProcessHandler separated

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

spajak commented 6 years ago

Didn't work. Did you tried this feature on Windows?

spajak commented 6 years ago

I wasn't able to compile this on Windows. Besides this feature needs more work. Like some timeout for example, without this, when the callback is never called cef-pdf process is running forever - this cannot be allowed

beckyconning commented 6 years ago

Why not? If the HTTP connection is closed then it is cancelled, if the process receives SIGINT it is cancelled. Why is it cef-pdfs responsibility to manage timeouts when anything which uses it can easily do this itself? Also yes I did test this on Windows.

spajak commented 6 years ago

Closing http connection does not cause renderer process to quit. cef-pdf starts additional process for every job. It is also responsible for closing the process, otherwise the process runs forever

beckyconning commented 6 years ago

Ah I see. Shouldn't that be the priority then? Surely if the HTTP connection is closed there is no need for the job process?

beckyconning commented 6 years ago

What you describe could happen already if loading the page takes infinite time.

beckyconning commented 6 years ago

Like even without remote-trigger.

beckyconning commented 6 years ago

So the client should be able to abort and cause cleanup by closing the collection.

beckyconning commented 6 years ago

Ah I found the bug. All the trigger stuff was working fine but the convenience function was being put on the window object of about:blank rather than the window object of the given web page.

beckyconning commented 6 years ago

I must have been testing with the full expression rather than the convenience function.

beckyconning commented 6 years ago

Regarding timeouts please consider these scenarios:

Direct connection to PDF generator: A timeout prevents successful PDF generation. An application user chooses to download a PDF. The application informs the user that the PDF is being produced and this might take some time and to leave the application open while it generates. The user switches to another app and continues with other work while the PDF is generating. The given PDF they asked for takes 3 minutes to generate. The timeout is set for 2 minutes. The user switch back to the app after 5 minutes and find that the PDF has "timed out". How frustrating! The user could have received the PDF but the timeout has prevented this.

Direct connection to PDF generator: Successful PDF generation. An application user chooses to download a PDF. The application informs the user that the PDF is being produced and this might take some time and to leave the application open while it generates. The user switches to another app and continues with other work while the PDF is generating. The given PDF they asked for takes 3 minutes to generate. There is no timeout. The user switches back to the app after 5 minutes and find that the PDF has been generated.

Direct connection to PDF generator: User impatience prevents successful PDF generation. An application user chooses to download a PDF. The application informs the user that the PDF is being produced and this might take some time and to leave the application open while it generates. The user switches to another app and continues with other work while the PDF is generating. The given PDF they asked for takes 3 minutes to generate. There is a 4 minute timeout. The user switches back to the app after 2 minutes to find that the PDF is still being generated. The user closes the application.

If processes and resources dedicated to the generation of this PDF are not cleaned up when the connection is dropped the PDF generation will continue wastefully for 2 minutes despite no-one ever receiving this PDF.

Indirect connection from app to PDF generator: Successful PDF generation An application user called Saiid chooses to download a PDF. The application informs Saiid that the PDF is being produced, that this might make some time and that he can close the app and come back later to collect it. Saiid closes the app and continues with other work while the PDF is generating. The PDF Saiid requested takes 12 minutes to generate.

The application server is set to allow 1000 concurrent PDF generations. Currently there are 1000 generations in progress. The application server cancels the longest running generation which has been in progress for over 10 minutes to make room for Saiid's PDF (if no generation had been running for more than 10 minutes then Saiid's generation would pend until either a generation finished or lasted more than 10 minutes). This generation was started by a user called Hadil. If Hadil checks the app now she will be informed that her PDF was taking a long time to produce and that it will be reattempted when the service is less busy. The application informs the administrator that this has occurred so the administrator can decide whether to increase the generation capacity of this service or not.

The application then starts generating Saiid's PDF. Whilst this is happening 700 generations finish successfully after which only 10 more are started. The service is now less busy so the server reattempts Hadil's PDF generation.

10 minutes later Saiid launches the app and finds his PDF ready to download. 30 minutes later Hadil launches the app and finds her PDF ready to download. If generation cancellation was based only on time rather than on the server capacity and time neither of these PDF would never have been successfully generated.

beckyconning commented 6 years ago

cef-pdf could decide to implement a more thoughtful cancellation policy such as the one described in the last scenario. However it has no obligation to and applications may wish to implement their own differing policy.

In many cases a cancellation policy based solely on connection is sufficient. If we're still waiting for the pdf then please keep trying to produce it is sensible. If it is not sufficient this policy can be used by other applications to produce any other cancellation policy.

Separation of concerns is important and providing even an optional timeout is dangerous as it encourages misuse.

If anyone wants to use a timeout cancellation policy they can implement it themselves trivially.

Via HTTP curl and wget provide timeouts and ajax timeouts are easy to implement in javascript and other languages. When timeout is exceeded the connection is closed after which cef-pdf should clean up.

Via the command line these examples work pretty well.

bash

( pid=$BASHPID; (sleep 120; kill $pid) & exec cef-pdf --remote-trigger --url=https://reporting.example.com/view/0981230)

timeout

timeout 120 cef-pdf --remote-trigger --url=https://reporting.example.com/view/0981230

windows

start cef-pdf.exe --remote-trigger --url=https://reporting.example.com/view/0981230
timeout /t 120
taskkill /im cef-pdf.exe /f