jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.7k stars 563 forks source link

Gnarly proof-of-concept PDF generation with QtWebEngine #1031

Open bollwyvl opened 5 years ago

bollwyvl commented 5 years ago

Howdy folks!

A thing I've been kicking around for some time is generating PDF in the browser you'll already have if you install jupyter (literally) because of Qt. I tried it before back in the bad old days of QtWebKit, but it was pretty frustrating.

As of a few months ago, the new QtWebEngine really got good enough to give this another go. The good news is it ships full Chromium. It supports every kind of output i threw at it, including mathjax, bqplot, pythreejs, ipyvolume, ipywebrtc. The "bad" news is that the version most installable (via (ana)conda(-forge)) is a little old (58), but I really don't care, as you'll be able to spin up a qt5.9 environment for a really long time to come. I'd prefer a long-term firefox engine, but it's just not as easy to build, deploy, command and customize as what the qt folks have done.

Anyhow, here's the nasty PoC, where I've probably taken too many liberties (it can literally look like JupyterLab, down to all the chrome, which can be in turn imported into Inskcape): https://github.com/jupyter/nbconvert/compare/master...bollwyvl:web-pdf?expand=1

And a fairly pleasing bqplot example: Screenshot from 2019-05-24 08-32-11

bqplot.pdf

There's a lot of things to expand upon, namely

And it probably doesn't belong in nbconvert proper, but instead should be a separate package. But I think for a lot of folks, this would answer many flavors of mail.

MSeal commented 5 years ago

This is an interesting proof of concept! This would be a lot smaller dependency chain to achieve pdf export than nbconvert's current setup, but the entire library avoids using any browser interfaces today so I'd hesitate to merge something like atm as you suggested. Would love to hear from lab folks @SylvainCorlay if there's interest in this or thoughts on how it such a plugin might be usable for the web clients.

maartenbreddels commented 5 years ago

Whow, this is impressive stuff, and much better than my pr using Chrome headless. Super useful for widgets, galleries, testing etc Does it run headless in Travis?

(from mobile phone)

On Sat, 25 May 2019, 19:48 Matthew Seal, notifications@github.com wrote:

This is an interesting proof of concept! This would be a lot smaller dependency chain to achieve pdf export than nbconvert's current setup, but the entire library avoids using any browser interfaces today so I'd hesitate to merge something like atm as you suggested. Would love to hear from lab folks @SylvainCorlay https://github.com/SylvainCorlay if there's interest in this or thoughts on how it such a plugin might be usable for the web clients.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbconvert/issues/1031?email_source=notifications&email_token=AANPEPMNRZ5SYJQVN6OEO4DPXF3XHA5CNFSM4HPPE53KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHWGOY#issuecomment-495936315, or mute the thread https://github.com/notifications/unsubscribe-auth/AANPEPOBBFS7STBQZZKF4U3PXF3XHANCNFSM4HPPE53A .

bollwyvl commented 5 years ago

@MSeal entire library avoids using any browser interfaces

I mean, we generate a lot of html, css and javascript! But I totally appreciate the hesitancy to declare The One True Browser. entry_pointed in is probably fine, it would have the same API.

@maartenbreddels Does it run headless in Travis?

Good question! I'll see what i can work up! I'd likely start at Azure, if doing something new-start, but it should be cross-applicable.

maartenbreddels commented 5 years ago

Maybe I shouldn't go too much offtopic, but I'm gonna do it anyway :) I think this can be really a massive boost for widgets. I think a hybrid of this with https://github.com/maartenbreddels/flask-ipywidgets (which basically mocks a kernel for the current running Python process) would allow for instance bqplot and ipyvolume in a Qt app.

maartenbreddels commented 5 years ago

Ontopic again: For reference, this can be an alternative for #901

bollwyvl commented 5 years ago

As a light start on automation, i had a crack at splitting this off as a standalone repo (Binder).

Seeing this when trying to run it

!jupyter nbconvert --to pdfqt Examples.ipynb

[NbConvertApp] Converting notebook Examples.ipynb to pdfqt
[NbConvertApp] Building PDF...
/srv/conda/envs/notebook/lib/python3.7/runpy.py:125: RuntimeWarning: 'nbconvert_pdfqt.exporter' found in sys.modules after import of package 'nbconvert_pdfqt', but prior to execution of 'nbconvert_pdfqt.exporter'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/srv/conda/envs/notebook/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jovyan/src/nbconvert_pdfqt/exporter.py", line 227, in <module>
    print_pdf()
  File "/home/jovyan/src/nbconvert_pdfqt/exporter.py", line 139, in print_pdf
    from PyQt5 import QtCore, QtWidgets, QtWebEngineWidgets, QtGui
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

So certainly still some things to run down.

maartenbreddels commented 5 years ago

@astrofrog has experience with this, we had to set this up in glue-jupyter, in the old version we did this: https://github.com/glue-viz/glue-jupyter/blob/fdb1304b4b0e8a635c0a70cda9c040b881fbb3b5/binder/apt.txt Not sure how it works in master.

bollwyvl commented 5 years ago

Thanks for the pointers! It might be a bit before I can get back to keyboard

On Sat, May 25, 2019, 16:30 Maarten Breddels notifications@github.com wrote:

@astrofrog https://github.com/astrofrog has experience with this, we had to set this up in glue-jupyter, in the old version we did this: https://github.com/glue-viz/glue-jupyter/blob/fdb1304b4b0e8a635c0a70cda9c040b881fbb3b5/binder/apt.txt Not sure how it works in master.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jupyter/nbconvert/issues/1031?email_source=notifications&email_token=AAALCRERZPMXMQRXSJO2NKDPXGOWJA5CNFSM4HPPE53KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWHYVVI#issuecomment-495946453, or mute the thread https://github.com/notifications/unsubscribe-auth/AAALCRBQENRDJWFP2RK4G4TPXGOWJANCNFSM4HPPE53A .

bollwyvl commented 5 years ago

The current binder does generate a pdf, with the addition of xvfb (e.g !xvfb-run jupyter nbconvert ....). Not exactly light weight, but baby steps.

I'm no longer using a QWebEngineView, so I don't have to show() it, which seemed to clear some things up. However, now we have to do more bookkeeping of the size of the content, since you can't just resize the UI.

It does appear to have some timing issues, so getting the web channels stuff setup is next... This will, I believe, let us signal a halted state, do measurements, and resize stuff before printing.

I'll try to spend some more cycles on it this evening!

As to embedding in a real interactive qt app: very plausible, but probably orthogonal to this effort which has enough on its hands. If you can assume a gui and a screen, things are much more predictable than all this headless stuff. The high road to getting stuff into/out of a kernel is to impersonate the kernel, emitting comms directly to the widget bus over zmq. In my experiments to that end, I've kept the qt and tornado on separate processes, but quamash might allow unifying the event loops so everything could be in process... But that might be a bad idea for any number of reasons. Websockets are probably fine.

bollwyvl commented 5 years ago

Did a little demo of this at the Jupyter community call: one data point: the full lab-in-qt app was almost unusable with Zoom runinng!

maartenbreddels commented 5 years ago

I saw it, impressive!

SylvainCorlay commented 4 years ago

@bollwyvl this is really important work.

I am thinking of picking this up - and mostly work on

bollwyvl commented 4 years ago

🕺🏽

Basically all the deathbeds projects are like puppies: free to a good home. Heck they're just free, so do whatever you like!

pyqt just bumped to 5.12 on conda-forge, so now is a great time to pick it up, as it will get us to a more modern chrome (69). Almost more importantly, it can be tested with a more modern chromedriver, rather than one on the very edge of packaging memory, but not quite to the bleeding edge of the post-72 madness.

I think it's imperative that this feature, if it is to be supported and/or eventually upstreamed, be tested really robustly on the major platforms with some seriously complex documents. People go to PDF when they mean business (posters, papers, proposals, and PhD theses), and something failing at this part of the toolchain can make them leave the ecosystem for good.

On the voila template piece: sounds great. We've started kicking the tires on phoila, and it's going like gangbusters. It would be mind blowing to be able to "press button get pdf with print-ready vector graphics" from inside a voila.

As regards further features: I still think pdfa/2 is still the right target, and at least rudimentary control over page breaks, margins, headers, and page numbers is crucial.

Apologies for not having been able to push it further forward... I even have a little bit more work I did that isn't published. But I'm glad to help in any way I can!

SylvainCorlay commented 4 years ago

On the voila template piece: sounds great.

With the nbconvert refactor by maarten (nbconvert 6.x branch) it will become natural.

We've started kicking the tires on phoila, and it's going like gangbusters. It would be mind blowing to be able to "press button get pdf with print-ready vector graphics" from inside a voila.

It would be really nice if it does not become an actual fork of voila. Having a lab-based template in voila is totally in the roadmap. They are some choices in phoila that do not fit 100% in te voila template machinery. Would you like to coordinate more with us on this?

bollwyvl commented 4 years ago

Just playing user/build engineer in the phoila case. Fork or not, I'm unlikely to use voila directly, unless it works directly with phosphor and lab stuff... I don't have the time or bandwidth to support 2x toolchains/test setups, already matrixed out pretty hard. I don't need multiple templates, just one that kinda works like the lab dock panel.

The key wins are it lets me and my teams build heavy-duty, lab-forward stuff, in lab, not worrying much about how the app will look, and the apps work offline without standing up an unpkg mirror (either checked in or built on ci).

SylvainCorlay commented 4 years ago

@bollwyvl btw Voilà comes with a preview extension for lab - I don't know if you saw it already.

SylvainCorlay commented 4 years ago

The key wins are it lets me and my teams build heavy-duty, lab-forward stuff, in lab, not worrying much about how the app will look, and the apps work offline without standing up an unpkg mirror (either checked in or built on ci).

This is totally what we are working on.

bollwyvl commented 4 years ago

Groovy!