jupytercalpoly / jupyterlab-richtext-mode

JupyterLab extension for rich text editing in the notebook
43 stars 18 forks source link

Converting to DOCX and ODT (but not vice versa) #54

Closed krinsman closed 5 years ago

krinsman commented 5 years ago

This extension is really useful! It should make Jupyter a lot easier to use for a lot of people.

I vaguely remember you stating during your presentation at UC Berkeley/BIDS that longer term you were looking for how one might convert from notebooks to DOCX and ODT (as well as vice versa).

Is it not already possible to do this though using NBConvert? https://github.com/jupyter/nbconvert/blob/master/nbconvert/filters/pandoc.py

Seemingly the main issue is when either the Markdown or the word processor documents have embedded documents, but apparently this also has a solution.

The function also allows one to add the --extract-media flag using extra-args, e.g.: pandoc(source, fmt, to, extra_args=['--extract-media=.']) or using the other convenience method convert_pandoc(source, from_format, to_format, extra_args=['--extract-media=.'])

But anyway it should then be fairly straightforward, to convert: IPYNB --> Markdown and then using NBConvert's API for Pandoc, Markdown --> ODT/DOCX

Pandoc can also convert ODT or DOCX to Markdown, so it should be possible to go at least halfway in the other direction. According to example 15 here, it is apparently also possible to convert Markdown to IPYNB, but I'm skeptical. At least if one converts from IPYNB to Markdown and then back again, I expect that the resulting notebook will not be the same as the original and have lost several things (e.g. code cells), even when using --extract-media. But I haven't had the chance to test this yet, so I don't actually know.

krinsman commented 5 years ago

I'm sure this is not the best way to do it, but e.g. something like:

import nbformat
from nbconvert.exporters.base import export
from nbconvert.exporters.markdown import MarkdownExporter
from nbconvert.filters import convert_pandoc

nbnode = nbformat.read('notebook.ipynb', as_version=nbformat.NO_CONVERT)
markdown_string = str(export(MarkdownExporter, nbnode))
odt_string = convert_pandoc(str(markdown_string), from_format='markdown', to_format='odt')

Something similar to this seemingly should be enough for a "minimum viable product".

See: (1) (2) (3) (4)

dLamSlo8 commented 5 years ago

Thank you for the feedback and the information you have provided. Our goal for this was to create an export option in the JupyterLab menu for DOCX/ODT. We have taken into account that NBConvert / Pandoc does work for conversion, and we want to actually include that as part of the process in this overall extension. We were thinking about hooking into the Contents API, and handling the conversion there. Thank you for the initial code provided and the information regarding the possible conversion process with NBConvert. We will look into it further and keep you updated!

krinsman commented 5 years ago

Oh OK that makes sense.

Yeah in that case then the above probably isn't helpful. To the best of my knowledge the export options currently built into Lab/Notebook use NBConvert, although I don't quite understand the code that does this (which seems to be here as you probably already know):

https://github.com/jupyterlab/jupyterlab/blob/9debf60ec0df4e9c826e4b04a771722d518c8360/packages/notebook-extension/src/index.ts#L2011

As far as I can tell the way the built-in Service Manager, which appears to be what is used to connect to nbconvert in order to export the notebooks from the file menu, talks to nbconvert via a REST API:

https://github.com/jupyterlab/jupyterlab/blob/master/packages/services/src/nbconvert/index.ts

Which makes sense actually, since then no server extension is required to run custom Python code (unlike what I suggested above).

The REST API for NBConvert seems undocumented though (as far as I can tell). Also NBConvert doesn't convert directly to DOCX/ODT, so for the conversion from Markdown to DOCX/ODT it seems to mostly be useful for the Pandoc wrapper function which would allow one to use Pandoc for that conversion.