jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.75k stars 569 forks source link

Support embedded pdf plots in output cells in conversion to Markdown #1284

Open allefeld opened 4 years ago

allefeld commented 4 years ago

I have a notebook that contains plots generated by Plotly. These plots are usually interactive, so they are not suitable for export to static formats. But it is possible to configure Plotly to create e.g. pdf plots, which are embedded as application/pdf objects in the notebook. If I then use nbconvert to convert the notebook to a pdf, the pdf plots are embedded as expected. The same holds for LaTeX export, the code contains \adjustimage commands and the plots are put as pdf files in a subdirectory notebook_files.

However, when I export to Markdown, the plots are simply missing.

I understand that Markdown was originally intended as an easy way to write HTML, where pdf plots don't make sense. But since you use Pandoc internally, you are certainly aware that with Pandoc, Markdown has become much more versatile. Depending on which format one converts to, including pdf plots with standard Markdown syntax ![](plot.pdf) makes perfect sense.

This is important to me because though I use LaTeX for pdf creation, I very much prefer writing Pandoc-Markdown over LaTeX. I'm an academic, and I write everything from todo lists to journal papers using Pandoc' Markdown, and I have a pipeline set up that creates pdfs from Markdown files the way I like it. Being able to convert a notebook into this kind of file I work with everyday would be extremely useful.

Is this something you may support, e.g. as a fix to the Markdown exporter? Or on a slightly larger scale, by having a Pandoc-Markdown exporter?

As a workaround, I tried converting the LaTeX export to Markdown via Pandoc, but that fails (in my case, over Verbatim environments).

Nbconvert version: 5.6.1

MSeal commented 4 years ago

Would something like https://github.com/jupyter/nbconvert/pull/1285 allow for what you're looking for?

I think I want to switch our default pdf conversion to a pandoc based solution over a LaTeX version for 6.0. It's the next thing on my place for the release (will be working on it this next weekend some), but generally the tools available are all problematic with their own subselection of things that do or don't render as one would want or severe tradeoffs for what's allowed.

allefeld commented 4 years ago

I'm afraid I don't understand what #1285 is about. Maybe?

I looked into the design of nbconvert to see whether I can fix this myself, and my impression was that it is quite a patchwork. I had imagined that maybe there is a conversion to Markdown first, and then other export formats are generated from that via Pandoc, but that does not seem to be the case. So a solution that is generally based on Pandoc might be both more straightforward and more powerful.

On the other hand, since the addition of ipynb to Pandoc 2.6, nbconvert now tends to be redundant with Pandoc itself. I don't have the overview, but is there something that nbconvert can do that Pandoc cannot, at least in principle? – I tried using Pandoc directly for my purpose, which ran into it's own problems, but jgm responded so they may be resolved soon. see https://github.com/jgm/pandoc/issues/6430

MSeal commented 4 years ago

I'm afraid I don't understand what #1285 is about. Maybe?

It renders a chrome browser view of the notebook to PDF, which I think would enable exactly what you're describing. You could try checking out that PR branch and running the command in the PR against your notebook to see if it works (would be a nice test).

had imagined that maybe there is a conversion to Markdown first, and then other export formats are generated from that via Pandoc

No, many conversions wouldn't survive transition to markdown then to a second format while preserving shapes and formatting (LaTeX equations for example which drove a lot of early nbconvert development before I joined). Nbconvert's logic for doing the more complex conversions is difficult to walk through. I've been trying to keep basic support there but the design overall is overcomplicated and the grid of combinations of features for rendering is really really big so having everything that's visible from a webapp embed convertable to all other formats is somewhat unmaintainable.

I don't have the overview, but is there something that nbconvert can do that Pandoc cannot, at least in principle?

Many of the latex and styles nbconvert handles weren't covered by pandoc last time I looked into it. I would love if nbconvert became more of a styler for pandoc commands over time. I need to take a deeper look at the state of pandoc conversions and see what the current gaps are before assessing that more.